Related papers: Layer Collapse in Diffusion Language Models

A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether…

Computation and Language · Computer Science 2026-05-11 Raghavv Goel , Risheek Garrepalli , Sudhanshu Agrawal , Chris Lott , Mingu Lee , Fatih Porikli

Diffusion Language Models are Super Data Learners

Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later…

Machine Learning · Computer Science 2025-11-06 Jinjie Ni , Qian Liu , Longxu Dou , Chao Du , Zili Wang , Hang Yan , Tianyu Pang , Michael Qizhe Shieh

Large Language Diffusion Models

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised…

Computation and Language · Computer Science 2025-10-21 Shen Nie , Fengqi Zhu , Zebin You , Xiaolu Zhang , Jingyang Ou , Jun Hu , Jun Zhou , Yankai Lin , Ji-Rong Wen , Chongxuan Li

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To…

Computation and Language · Computer Science 2026-05-01 Yonggan Fu , Lexington Whalen , Zhifan Ye , Xin Dong , Shizhe Diao , Jingyu Liu , Chengyue Wu , Hao Zhang , Enze Xie , Song Han , Maksim Khadkevich , Jan Kautz , Yingyan Celine Lin , Pavlo Molchanov

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to…

Computation and Language · Computer Science 2025-06-03 Shansan Gong , Shivam Agarwal , Yizhe Zhang , Jiacheng Ye , Lin Zheng , Mukai Li , Chenxin An , Peilin Zhao , Wei Bi , Jiawei Han , Hao Peng , Lingpeng Kong

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

While Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm comparable to autoregressive (AR) models, their faithfulness, specifically regarding hallucination, remains largely underexplored. To…

Computation and Language · Computer Science 2026-04-14 Zhengnan Guo , Fei Tan

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Introspective Diffusion Language Models

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do…

Artificial Intelligence · Computer Science 2026-04-14 Yifan Yu , Yuqing Jian , Junxiong Wang , Zhongzhu Zhou , Donglin Zhuang , Xinyu Fang , Sri Yanamandra , Xiaoxia Wu , Qingyang Wu , Shuaiwen Leon Song , Tri Dao , Ben Athiwaratkun , James Zou , Fan Lai , Chenfeng Xu

Attention Sinks in Diffusion Language Models

Masked Diffusion Language Models (DLMs) have recently emerged as a promising alternative to traditional Autoregressive Models (ARMs). DLMs employ transformer encoders with bidirectional attention, enabling parallel token generation while…

Computation and Language · Computer Science 2025-12-11 Maximo Eduardo Rulli , Simone Petruzzi , Edoardo Michielon , Fabrizio Silvestri , Simone Scardapane , Alessio Devoto

Scaling Behavior of Discrete Diffusion Language Models

Modern LLM pre-training consumes vast amounts of compute and training data, making the scaling behavior, or scaling laws, of different models a key distinguishing factor. Discrete diffusion language models (DLMs) have been proposed as an…

Machine Learning · Computer Science 2026-02-17 Dimitri von Rütte , Janis Fluri , Omead Pooladzandi , Bernhard Schölkopf , Thomas Hofmann , Antonio Orvieto

DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their…

Computation and Language · Computer Science 2026-04-07 Jingyi Yang , Yuxian Jiang , Xuhao Hu , Shuang Cheng , Biqing Qi , Jing Shao

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous…

Computation and Language · Computer Science 2026-05-19 Zhihan Yang , Wei Guo , Shuibai Zhang , Subham Sekhar Sahoo , Yongxin Chen , Arash Vahdat , Morteza Mardani , John Thickstun

How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices

Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source…

Computation and Language · Computer Science 2025-11-11 Han Peng , Peiyu Liu , Zican Dong , Daixuan Cheng , Junyi Li , Yiru Tang , Shuo Wang , Wayne Xin Zhao

What Makes Diffusion Language Models Super Data Learners?

Recent studies have shown that diffusion language models achieve remarkable data efficiency under limited-data constraints, yet the underlying mechanisms remain unclear. In this work, we perform extensive ablation experiments to disentangle…

Computation and Language · Computer Science 2025-10-07 Zitian Gao , Haoming Luo , Lynx Chen , Jason Klein Liu , Ran Tao , Joey Zhou , Bryan Dai

Diffusion Language Models Generation Can Be Halted Early

Diffusion Language models (DLMs) are a promising avenue for text generation due to their practical properties on tractable controllable generation. They also have the advantage of not having to predict text autoregressively. However,…

Machine Learning · Computer Science 2024-02-13 Sofia Maria Lo Cicero Vaina , Nikita Balagansky , Daniil Gavrilov

Continuous Latent Diffusion Language Model

Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation…

Computation and Language · Computer Science 2026-05-08 Hongcan Guo , Qinyu Zhao , Yian Zhao , Shen Nie , Rui Zhu , Qiushan Guo , Feng Wang , Tao Yang , Hengshuang Zhao , Guoqiang Wei , Yan Zeng

Looped Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that…

Machine Learning · Computer Science 2026-05-26 Sanghyun Lee , Chunsan Hong , Seungryong Kim , Jonghyun Lee , Jongho Park , Dongmin Park

DLLM Agent: See Farther, Run Faster

Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their implications for agentic multi-step decision making remain underexplored. We…

Computation and Language · Computer Science 2026-03-23 Huiling Zhen , Weizhe Lin , Renxi Liu , Kai Han , Yiming Li , Yuchuan Tian , Hanting Chen , Xiaoguang Li , Xiaosong Li , Chen Chen , Xianzhi Yu , Mingxuan Yuan , Youliang Yan , Peifeng Qin , Jun Wang , Yu Wang , Dacheng Tao , Yunhe Wang

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale…

Machine Learning · Computer Science 2025-12-25 Tiwei Bie , Maosong Cao , Kun Chen , Lun Du , Mingliang Gong , Zhuochen Gong , Yanmei Gu , Jiaqi Hu , Zenan Huang , Zhenzhong Lan , Chengxi Li , Chongxuan Li , Jianguo Li , Zehuan Li , Huabin Liu , Lin Liu , Guoshan Lu , Xiaocheng Lu , Yuxin Ma , Jianfeng Tan , Lanning Wei , Ji-Rong Wen , Yipeng Xing , Xiaolu Zhang , Junbo Zhao , Da Zheng , Jun Zhou , Junlin Zhou , Zhanchao Zhou , Liwang Zhu , Yihong Zhuang

Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while…

Artificial Intelligence · Computer Science 2026-03-17 Earl J St Sauver