Related papers: Why Diffusion Language Models Struggle with Truly …

WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference

Autoregressive (AR) generation is the standard decoding paradigm for Large Language Models (LLMs), but its token-by-token nature limits parallelism at inference time. Diffusion Language Models (DLLMs) offer parallel decoding by recovering…

Computation and Language · Computer Science 2025-12-30 Aiwei Liu , Minghua He , Shaoxun Zeng , Sijun Zhang , Linhao Zhang , Chuhan Wu , Wei Jia , Yuan Liu , Xiao Zhou , Jie Zhou

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices

Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source…

Computation and Language · Computer Science 2025-11-11 Han Peng , Peiyu Liu , Zican Dong , Daixuan Cheng , Junyi Li , Yiru Tang , Shuo Wang , Wayne Xin Zhao

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks, including document processing and code generation. Autoregressive Language Models (ARMs), which generate…

Machine Learning · Computer Science 2025-12-16 Minseo Kim , Coleman Hooper , Aditya Tomar , Chenfeng Xu , Mehrdad Farajtabar , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

Discrete Diffusion Models Exploit Asymmetry to Solve Lookahead Planning Tasks

While Autoregressive (AR) Transformer-based Generative Language Models are frequently employed for lookahead tasks, recent research suggests a potential discrepancy in their ability to perform planning tasks that require multi-step…

Machine Learning · Computer Science 2026-02-24 Itamar Trainin , Shauli Ravfogel , Omri Abend , Amir Feder

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source…

Machine Learning · Computer Science 2025-08-14 Xu Wang , Chenkai Xu , Yijie Jin , Jiachun Jin , Hao Zhang , Zhijie Deng

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a…

Computation and Language · Computer Science 2025-10-27 Yeongbin Seo , Dongha Lee , Jaehyung Kim , Jinyoung Yeo

MDM-ASR: Bridging Accuracy and Efficiency in ASR with Diffusion-Based Non-Autoregressive Decoding

In sequence-to-sequence Transformer ASR, autoregressive (AR) models achieve strong accuracy but suffer from slow decoding, while non-autoregressive (NAR) models enable parallel decoding at the cost of degraded performance. We propose a…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-26 Hao Yen , Pin-Jui Ku , Ante Jukić , Sabato Marco Siniscalchi

Discrete Diffusion in Large Language and Multimodal Models: A Survey

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel…

Machine Learning · Computer Science 2025-09-22 Runpeng Yu , Qi Li , Xinchao Wang

Locally Coherent Parallel Decoding in Diffusion Language Models

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing.…

Computation and Language · Computer Science 2026-05-19 Michael Hersche , Nicolas Menet , Ronan Tanios , Abbas Rahimi

Diffusion Language Models are Provably Optimal Parallel Samplers

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide a rigorous foundation for this advantage by formalizing a model of parallel…

Machine Learning · Computer Science 2026-01-01 Haozhe Jiang , Nika Haghtalab , Lijie Chen

Introspective Diffusion Language Models

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do…

Artificial Intelligence · Computer Science 2026-04-14 Yifan Yu , Yuqing Jian , Junxiong Wang , Zhongzhu Zhou , Donglin Zhuang , Xinyu Fang , Sri Yanamandra , Xiaoxia Wu , Qingyang Wu , Shuaiwen Leon Song , Tri Dao , Ben Athiwaratkun , James Zou , Fan Lai , Chenfeng Xu

A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether…

Computation and Language · Computer Science 2026-05-11 Raghavv Goel , Risheek Garrepalli , Sudhanshu Agrawal , Chris Lott , Mingu Lee , Fatih Porikli

d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation

Diffusion large language models (dLLMs) offer capabilities beyond those of autoregressive (AR) LLMs, such as parallel decoding and random-order generation. However, realizing these benefits in practice is non-trivial, as dLLMs inherently…

Machine Learning · Computer Science 2026-01-30 Yu-Yang Qian , Junda Su , Lanxiang Hu , Peiyuan Zhang , Zhijie Deng , Peng Zhao , Hao Zhang

Divide and Conquer: Accelerating Diffusion-Based Large Language Models via Adaptive Parallel Decoding

Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that…

Computation and Language · Computer Science 2026-03-02 Xiangzhong Luo , Yilin An , Zhicheng Yu , Weichen Liu , Xu Yang

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Diffusion-based decoding has recently emerged as an appealing alternative to autoregressive (AR) generation, offering the potential to update multiple tokens in parallel and reduce latency. However, diffusion vision language models (dVLMs)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Lunbin Zeng , Jingfeng Yao , Bencheng Liao , Hongyuan Tao , Wenyu Liu , Xinggang Wang

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive language generation due to their potential for parallel decoding and global refinement of the entire sequence. To unlock this potential, DLM…

Machine Learning · Computer Science 2026-04-20 Xiang Xia , Wuyang Zhang , Jiazheng Liu , Cheng Yan , Yanyong Zhang

Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning

We study reasoning tasks through a framework that integrates auto-regressive (AR) and non-autoregressive (NAR) language models. AR models, which generate text sequentially, excel at producing coherent outputs but often suffer from slow…

Artificial Intelligence · Computer Science 2025-09-26 Qihang Ai , Haiyun Jiang

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To…

Computation and Language · Computer Science 2026-05-01 Yonggan Fu , Lexington Whalen , Zhifan Ye , Xin Dong , Shizhe Diao , Jingyu Liu , Chengyue Wu , Hao Zhang , Enze Xie , Song Han , Maksim Khadkevich , Jan Kautz , Yingyan Celine Lin , Pavlo Molchanov

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretically allow for parallel token generation, but in…

Computation and Language · Computer Science 2025-11-03 Daniel Israel , Guy Van den Broeck , Aditya Grover