Related papers: Fast Interleaved Bidirectional Sequence Generation

Efficient Bidirectional Neural Machine Translation

The encoder-decoder based neural machine translation usually generates a target sequence token by token from left to right. Due to error propagation, the tokens in the right side of the generated sequence are usually of poorer quality than…

Computation and Language · Computer Science 2019-08-27 Xu Tan , Yingce Xia , Lijun Wu , Tao Qin

Sequence Generation: From Both Sides to the Middle

The encoder-decoder framework has achieved promising process for many sequence generation tasks, such as neural machine translation and text summarization. Such a framework usually generates a sequence token by token from left to right,…

Computation and Language · Computer Science 2019-06-25 Long Zhou , Jiajun Zhang , Chengqing Zong , Heng Yu

Synchronous Bidirectional Inference for Neural Sequence Generation

In sequence to sequence generation tasks (e.g. machine translation and abstractive summarization), inference is generally performed in a left-to-right manner to produce the result token by token. The neural approaches, such as LSTM and…

Computation and Language · Computer Science 2019-02-26 Jiajun Zhang , Long Zhou , Yang Zhao , Chengqing Zong

A Framework for Bidirectional Decoding: Case Study in Morphological Inflection

Transformer-based encoder-decoder models that generate outputs in a left-to-right fashion have become standard for sequence-to-sequence tasks. In this paper, we propose a framework for decoding that produces sequences from the "outside-in":…

Computation and Language · Computer Science 2023-10-31 Marc E. Canby , Julia Hockenmaier

Pipelined Decoder for Efficient Context-Aware Text Generation

As the basis of generative AI, an autoregressive model requires the generation of a new token depending on all the previously generated tokens, which brings high quality but also restricts the model to generate tokens one by one, forming a…

Computation and Language · Computer Science 2025-07-02 Zixian Huang , Chenxu Niu , Yu Gu , Gengyang Xiao , Xinwei Huang , Gong Cheng

Middle-Out Decoding

Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled. In this paper, we speculate that a fundamental shortcoming of sequence generation models is…

Computation and Language · Computer Science 2018-10-30 Shikib Mehri , Leonid Sigal

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…

Computation and Language · Computer Science 2019-09-05 Marjan Ghazvininejad , Omer Levy , Yinhan Liu , Luke Zettlemoyer

Synchronous Bidirectional Neural Machine Translation

Existing approaches to neural machine translation (NMT) generate the target language sequence token by token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts…

Computation and Language · Computer Science 2019-05-14 Long Zhou , Jiajun Zhang , Chengqing Zong

Attending to Future Tokens For Bidirectional Sequence Generation

Neural sequence generation is typically performed token-by-token and left-to-right. Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification,…

Machine Learning · Statistics 2019-09-18 Carolin Lawrence , Bhushan Kotnis , Mathias Niepert

Blockwise Parallel Decoding for Deep Autoregressive Models

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make…

Machine Learning · Computer Science 2018-11-09 Mitchell Stern , Noam Shazeer , Jakob Uszkoreit

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Non-Autoregressive Machine Translation with Disentangled Context Transformer

State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in…

Computation and Language · Computer Science 2020-07-01 Jungo Kasai , James Cross , Marjan Ghazvininejad , Jiatao Gu

Transformer with Bidirectional Decoder for Speech Recognition

Attention-based models have made tremendous progress on end-to-end automatic speech recognition(ASR) recently. However, the conventional transformer-based approaches usually generate the sequence results token by token from left to right,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-12 Xi Chen , Songyang Zhang , Dandan Song , Peng Ouyang , Shouyi Yin

Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Autoregressive image generation aims to predict the next token based on previous ones. However, this process is challenged by the bidirectional dependencies inherent in conventional image tokenizations, which creates a fundamental…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Pingyu Wu , Kai Zhu , Yu Liu , Longxiang Tang , Jian Yang , Yansong Peng , Wei Zhai , Yang Cao , Zheng-Jun Zha

FastSeq: Make Sequence Generation Faster

Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop…

Computation and Language · Computer Science 2021-07-14 Yu Yan , Fei Hu , Jiusheng Chen , Nikhil Bhendawade , Ting Ye , Yeyun Gong , Nan Duan , Desheng Cui , Bingyu Chi , Ruofei Zhang

Accelerating Transformer Inference for Translation via Parallel Decoding

Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network architectures and learning-based methods to solve this issue, which are expensive and require changes to the…

Computation and Language · Computer Science 2025-02-06 Andrea Santilli , Silvio Severino , Emilian Postolache , Valentino Maiorca , Michele Mancusi , Riccardo Marin , Emanuele Rodolà

Set Block Decoding is a Language Model Inference Accelerator

Autoregressive next token prediction language models offer powerful capabilities but face significant challenges in practical deployment due to the high computational and memory costs of inference, particularly during the decoding stage. We…

Machine Learning · Computer Science 2025-09-05 Itai Gat , Heli Ben-Hamu , Marton Havasi , Daniel Haziza , Jeremy Reizenstein , Gabriel Synnaeve , David Lopez-Paz , Brian Karrer , Yaron Lipman

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion

Speculative decoding has emerged as a widely adopted method to accelerate large language model inference without sacrificing the quality of the model outputs. While this technique has facilitated notable speed improvements by enabling…

Computation and Language · Computer Science 2025-02-12 Jacob K Christopher , Brian R Bartoldson , Tal Ben-Nun , Michael Cardei , Bhavya Kailkhura , Ferdinando Fioretto

Fast Autoregressive Video Generation with Diagonal Decoding

Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yang Ye , Junliang Guo , Haoyu Wu , Tianyu He , Tim Pearce , Tabish Rashid , Katja Hofmann , Jiang Bian

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our…

Computation and Language · Computer Science 2022-05-23 Tao Ge , Heming Xia , Xin Sun , Si-Qing Chen , Furu Wei