Related papers: Efficient Encoder-Decoder Transformer Decoding for…

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Encoder-Decoder Shift-Reduce Syntactic Parsing

Starting from NMT, encoder-decoder neu- ral networks have been used for many NLP problems. Graph-based models and transition-based models borrowing the en- coder components achieve state-of-the-art performance on dependency parsing and…

Computation and Language · Computer Science 2017-06-27 Jiangming Liu , Yue Zhang

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network…

Machine Learning · Computer Science 2025-10-28 Marianne Arriola , Yair Schiff , Hao Phung , Aaron Gokaslan , Volodymyr Kuleshov

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Peng Tang , Pengkai Zhu , Tian Li , Srikar Appalaraju , Vijay Mahadevan , R. Manmatha

General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

The state of the art on many NLP tasks is currently achieved by large pre-trained language models, which require a considerable amount of computation. We explore a setting where many different predictions are made on a single piece of text.…

Computation and Language · Computer Science 2020-04-30 Jingfei Du , Myle Ott , Haoran Li , Xing Zhou , Veselin Stoyanov

Hyperdecoders: Instance-specific decoders for multi-task NLP

We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder…

Computation and Language · Computer Science 2022-10-19 Hamish Ivison , Matthew E. Peters

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard…

Computation and Language · Computer Science 2023-06-06 Michiel de Jong , Yury Zemlyanskiy , Joshua Ainslie , Nicholas FitzGerald , Sumit Sanghai , Fei Sha , William Cohen

Plug-and-Play Document Modules for Pre-trained Models

Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different…

Computation and Language · Computer Science 2023-05-30 Chaojun Xiao , Zhengyan Zhang , Xu Han , Chi-Min Chan , Yankai Lin , Zhiyuan Liu , Xiangyang Li , Zhonghua Li , Zhao Cao , Maosong Sun

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

In NLP, a large volume of tasks involve pairwise comparison between two sequences (e.g. sentence similarity and paraphrase identification). Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders.…

Computation and Language · Computer Science 2022-03-15 Fangyu Liu , Yunlong Jiao , Jordan Massiah , Emine Yilmaz , Serhii Havrylov

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

Structured pruning methods have proven effective in reducing the model size and accelerating inference speed in various network architectures such as Transformers. Despite the versatility of encoder-decoder models in numerous NLP tasks, the…

Computation and Language · Computer Science 2023-10-17 Jongwoo Ko , Seungjoon Park , Yujin Kim , Sumyeong Ahn , Du-Seong Chang , Euijai Ahn , Se-Young Yun

Explicit Sign-Magnitude Encoders Enable Power-Efficient Multipliers

This work presents a method to maximize power-efficiency of fixed point multiplier units by decomposing them into sub-components. First, an encoder block converts the operands from a two's complement to a sign magnitude representation,…

Neural and Evolutionary Computing · Computer Science 2025-07-25 Felix Arnold , Maxence Bouvier , Ryan Amaudruz , Renzo Andri , Lukas Cavigelli

Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation

Deep learning-based image compression has made great progresses recently. However, many leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the…

Image and Video Processing · Electrical Eng. & Systems 2023-09-07 Haisheng Fu , Feng Liang , Jie Liang , Yongqiang Wang , Guohe Zhang , Jingning Han

DDT: Decoupled Diffusion Transformer

Diffusion transformers have demonstrated remarkable generation quality, albeit requiring longer training iterations and numerous inference steps. In each denoising step, diffusion transformers encode the noisy inputs to extract the…

Computer Vision and Pattern Recognition · Computer Science 2025-04-10 Shuai Wang , Zhi Tian , Weilin Huang , Limin Wang

Balancing Cost and Benefit with Tied-Multi Transformers

We propose and evaluate a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding. In…

Computation and Language · Computer Science 2020-02-21 Raj Dabre , Raphael Rubino , Atsushi Fujita

Label-Looping: Highly Efficient Decoding for Transducers

This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-20 Vladimir Bataev , Hainan Xu , Daniel Galvez , Vitaly Lavrukhin , Boris Ginsburg

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Yifan Lu , Qi Wu , Jay Zhangjie Wu , Zian Wang , Huan Ling , Sanja Fidler , Xuanchi Ren

Progressive Mixed-Precision Decoding for Efficient LLM Inference

In spite of the great potential of large language models (LLMs) across various tasks, their deployment on resource-constrained devices remains challenging due to their excessive computational and memory demands. Quantization has emerged as…

Machine Learning · Computer Science 2025-02-28 Hao Mark Chen , Fuwen Tan , Alexandros Kouris , Royson Lee , Hongxiang Fan , Stylianos I. Venieris

Prompt Guided Transformer for Multi-Task Dense Prediction

Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yuxiang Lu , Shalayiding Sirejiding , Yue Ding , Chunlin Wang , Hongtao Lu

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

MCSD: An Efficient Language Model with Diverse Fusion

Transformers excel in Natural Language Processing (NLP) due to their prowess in capturing long-term dependencies but suffer from exponential resource consumption with increasing sequence lengths. To address these challenges, we propose MCSD…

Computation and Language · Computer Science 2024-07-12 Hua Yang , Duohai Li , Shiman Li