English
Related papers

Related papers: Blockwise Parallel Decoding for Deep Autoregressiv…

200 papers

We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Zhuoyang Zhang , Luke J. Huang , Chengyue Wu , Shang Yang , Kelly Peng , Yao Lu , Song Han

Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention…

Computation and Language · Computer Science 2023-08-30 Hao Liu , Pieter Abbeel

Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as…

Computation and Language · Computer Science 2024-06-06 Taehyeon Kim , Ananda Theertha Suresh , Kishore Papineni , Michael Riley , Sanjiv Kumar , Adrian Benton

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long…

Machine Learning · Computer Science 2018-06-11 Łukasz Kaiser , Aurko Roy , Ashish Vaswani , Niki Parmar , Samy Bengio , Jakob Uszkoreit , Noam Shazeer

Autoregressive models have emerged as a powerful approach for visual generation but suffer from slow inference speed due to their sequential token-by-token prediction process. In this paper, we propose a simple yet effective approach for…

Computer Vision and Pattern Recognition · Computer Science 2025-04-04 Yuqing Wang , Shuhuai Ren , Zhijie Lin , Yujin Han , Haoyuan Guo , Zhenheng Yang , Difan Zou , Jiashi Feng , Xihui Liu

Continuous visual autoregressive (AR) models have demonstrated promising performance in image generation. However, the heavy autoregressive inference burden imposes significant overhead. In Large Language Models (LLMs), speculative decoding…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Zili Wang , Robert Zhang , Kun Ding , Qi Yang , Fei Li , Shiming Xiang

Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought…

Machine Learning · Computer Science 2024-01-17 Yi Heng Lim , Qi Zhu , Joshua Selfridge , Muhammad Firmansyah Kasim

As the basis of generative AI, an autoregressive model requires the generation of a new token depending on all the previously generated tokens, which brings high quality but also restricts the model to generate tokens one by one, forming a…

Computation and Language · Computer Science 2025-07-02 Zixian Huang , Chenxu Niu , Yu Gu , Gengyang Xiao , Xinwei Huang , Gong Cheng

Recent advances in reasoning models have demonstrated significant improvements in accuracy by employing detailed and comprehensive reasoning processes. However, generating these lengthy reasoning sequences is computationally expensive and…

Computation and Language · Computer Science 2025-08-27 Yijiong Yu

Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yang Ye , Junliang Guo , Haoyu Wu , Tianyu He , Tim Pearce , Tabish Rashid , Katja Hofmann , Jiang Bian

Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands imposed by Transformers limit their ability…

Computation and Language · Computer Science 2023-11-28 Hao Liu , Matei Zaharia , Pieter Abbeel

Inspired by the remarkable success of autoregressive models in language modeling, this paradigm has been widely adopted in visual generation. However, the sequential token-by-token decoding mechanism inherent in traditional autoregressive…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Siyang Wang , Hanting Li , Wei Li , Jie Hu , Xinghao Chen , Feng Zhao

Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network architectures and learning-based methods to solve this issue, which are expensive and require changes to the…

Computation and Language · Computer Science 2025-02-06 Andrea Santilli , Silvio Severino , Emilian Postolache , Valentino Maiorca , Michele Mancusi , Riccardo Marin , Emanuele Rodolà

Large language models (LLMs) are increasingly used for long-content generation (e.g., long Chain-of-Thought reasoning) where decoding efficiency becomes a critical bottleneck: Autoregressive decoding is inherently limited by its sequential…

Computation and Language · Computer Science 2025-06-05 Zhepei Wei , Wei-Lin Chen , Xinyu Zhu , Yu Meng

Discrete normalizing flows are promising generative models with advantages such as analytical log-likelihood computation and end-to-end training. However, the architectural constraints to ensure invertibility and tractable Jacobian…

Machine Learning · Computer Science 2026-05-06 Jiaru Zhang , Juanwu Lu , Xiaoyu Wu , Ziran Wang , Ruqi Zhang

Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in…

Computation and Language · Computer Science 2018-11-13 Jindřich Libovický , Jindřich Helcl

Recent advances in large language models have shown that autoregressive modeling can generate complex and novel sequences that have many real-world applications. However, these models must generate outputs autoregressively, which becomes…

Machine Learning · Computer Science 2023-06-05 Asier Mujika

Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where…

Computation and Language · Computer Science 2021-06-28 Jungo Kasai , Nikolaos Pappas , Hao Peng , James Cross , Noah A. Smith

Sequential dependencies present a fundamental bottleneck in deploying large-scale autoregressive models, particularly for real-time applications. While traditional optimization approaches like pruning and quantization often compromise model…

Computation and Language · Computer Science 2025-10-09 Yunhai Hu , Zining Liu , Zhenyuan Dong , Tianfan Peng , Bradley McDanel , Sai Qian Zhang

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li
‹ Prev 1 2 3 10 Next ›