English
Related papers

Related papers: Latent Refinement Decoding: Enhancing Diffusion-Ba…

200 papers

Masked diffusion language models enable parallel token generation and offer improved decoding efficiency over autoregressive models. However, their performance degrades significantly when generating multiple tokens simultaneously, due to a…

Computation and Language · Computer Science 2026-05-12 Houxing Ren , Mingjie Zhan , Zimu Lu , Ke Wang , Yunqiao Yang , Haotian Hou , Junting Pan , Hongsheng Li

Diffusion language models enable parallel token generation through block-wise decoding, but their irreversible commitments can lead to stagnation, where the reverse diffusion process fails to make further progress under a suboptimal…

Computation and Language · Computer Science 2026-02-03 Xinyun Wang , Min Zhang , Sen Cui , Zhikang Chen , Bo Jiang , Kun Kuang , Mingbao Lin

Autoregressive (AR) Large Language Models (LLMs) have demonstrated significant success across numerous tasks. However, the AR modeling paradigm presents certain limitations; for instance, contemporary autoregressive LLMs are trained to…

Machine Learning · Computer Science 2025-02-10 Justin Deschenaux , Caglar Gulcehre

The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretically allow for parallel token generation, but in…

Computation and Language · Computer Science 2025-11-03 Daniel Israel , Guy Van den Broeck , Aditya Grover

Discrete diffusion models have emerged as a powerful class of models and a promising route to fast language generation, but practical implementations typically rely on factored reverse transitions ignoring cross-token dependencies and…

Machine Learning · Computer Science 2026-05-14 Dario Shariatian , Alain Durmus , Umut Simsekli , Stefano Peluchetti

Autoregressive (AR) generation is the standard decoding paradigm for Large Language Models (LLMs), but its token-by-token nature limits parallelism at inference time. Diffusion Language Models (DLLMs) offer parallel decoding by recovering…

Computation and Language · Computer Science 2025-12-30 Aiwei Liu , Minghua He , Shaoxun Zeng , Sijun Zhang , Linhao Zhang , Chuhan Wu , Wei Jia , Yuan Liu , Xiao Zhou , Jie Zhou

Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a…

Computation and Language · Computer Science 2025-10-27 Yeongbin Seo , Dongha Lee , Jaehyung Kim , Jinyoung Yeo

Large Language Models (LLMs) demonstrate their reasoning ability through chain-of-thought (CoT) generation. However, LLM's autoregressive decoding may limit the ability to revisit and refine earlier tokens in a holistic manner, which can…

Machine Learning · Computer Science 2026-04-24 Haoqiang Kang , Yizhe Zhang , Nikki Lijing Kuang , Nicklas Majamaki , Navdeep Jaitly , Yi-An Ma , Lianhui Qin

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Diffusion language models offer parallel token generation and inherent bidirectionality, promising more efficient and powerful sequence modeling compared to autoregressive approaches. However, state-of-the-art diffusion models (e.g., Dream…

Computation and Language · Computer Science 2025-10-10 Zhanqiu Hu , Jian Meng , Yash Akhauri , Mohamed S. Abdelfattah , Jae-sun Seo , Zhiru Zhang , Udit Gupta

Continuous diffusion and flow models are attractive for non-autoregressive text generation because they can update all positions in parallel. A major difficulty is the interface between continuous latent states and discrete tokens. This…

Computation and Language · Computer Science 2026-05-18 De Shuai Zhang

Diffusion-based large language models (Diffusion LLMs) have shown promise for non-autoregressive text generation with parallel decoding capabilities. However, the practical inference speed of open-sourced Diffusion LLMs often lags behind…

Computation and Language · Computer Science 2025-07-04 Chengyue Wu , Hao Zhang , Shuchen Xue , Zhijian Liu , Shizhe Diao , Ligeng Zhu , Ping Luo , Song Han , Enze Xie

Generative recommendation represents each item as a semantic ID, i.e., a sequence of discrete tokens, and generates the next item through autoregressive decoding. While effective, existing autoregressive models face two intrinsic…

Information Retrieval · Computer Science 2025-11-12 Teng Shi , Chenglei Shen , Weijie Yu , Shen Nie , Chongxuan Li , Xiao Zhang , Ming He , Yan Han , Jun Xu

Diffusion language models (DLMs) have recently emerged as an alternative to autoregressive approaches, offering parallel sequence generation and flexible token orders. However, their inference remains slower than that of autoregressive…

Computation and Language · Computer Science 2026-04-10 Pengxiang Li , Yefan Zhou , Dilxat Muhtar , Lu Yin , Shilin Yan , Li Shen , Soroush Vosoughi , Shiwei Liu

Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent…

Computation and Language · Computer Science 2026-05-11 Viacheslav Meshchaninov , Alexander Shabalin , Egor Chimbulatov , Nikita Gushchin , Ilya Koziev , Alexander Korotin , Dmitry Vetrov

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

Diffusion language models generate text through iterative denoising under a uniform refinement rule applied to all tokens. However, tokens stabilize at different rates in practice, leading to substantial redundant refinement and motivating…

Artificial Intelligence · Computer Science 2026-03-06 Lipeng Wan , Jianhui Gu , Junjie Ma , Jianguo Huang , Shiguang Sun , Siyuan Li , Xuguang Lan

Most multi-agent systems rely exclusively on autoregressive language models (ARMs) that are based on sequential generation. Although effective for fluent text, ARMs limit global reasoning and plan revision. On the other hand, Discrete…

Machine Learning · Computer Science 2026-03-11 Lina Berrayana , Ahmed Heakl , Abdullah Sohail , Thomas Hofmann , Salman Khan , Wei Chen

Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference…

Machine Learning · Computer Science 2026-02-04 Andre He , Sean Welleck , Daniel Fried

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li
‹ Prev 1 2 3 10 Next ›