Related papers: Diffusion Language Models are Super Data Learners

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to…

Computation and Language · Computer Science 2025-06-03 Shansan Gong , Shivam Agarwal , Yizhe Zhang , Jiacheng Ye , Lin Zheng , Mukai Li , Chenxin An , Peilin Zhao , Wei Bi , Jiawei Han , Hao Peng , Lingpeng Kong

What Makes Diffusion Language Models Super Data Learners?

Recent studies have shown that diffusion language models achieve remarkable data efficiency under limited-data constraints, yet the underlying mechanisms remain unclear. In this work, we perform extensive ablation experiments to disentangle…

Computation and Language · Computer Science 2025-10-07 Zitian Gao , Haoming Luo , Lynx Chen , Jason Klein Liu , Ran Tao , Joey Zhou , Bryan Dai

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices

Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source…

Computation and Language · Computer Science 2025-11-11 Han Peng , Peiyu Liu , Zican Dong , Daixuan Cheng , Junyi Li , Yiru Tang , Shuo Wang , Wayne Xin Zhao

Diffusion Beats Autoregressive in Data-Constrained Settings

Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages…

Machine Learning · Computer Science 2025-10-28 Mihir Prabhudesai , Mengning Wu , Amir Zadeh , Katerina Fragkiadaki , Deepak Pathak

Autoregressive vs. Masked Diffusion Language Models: A Controlled Comparison

We present a controlled empirical comparison between autoregressive (AR) and masked diffusion (MDLM) language models. Both models are trained on identical data (50M tokens from TinyStories), identical compute budget (20,000 steps, batch…

Computation and Language · Computer Science 2026-03-24 Caio Vicentino

Layer Collapse in Diffusion Language Models

Diffusion language models (DLMs) have recently emerged as competitive alternatives to autoregressive (AR) language models, yet differences in their activation dynamics remain poorly understood. We characterize these dynamics in LLaDA-8B and…

Machine Learning · Computer Science 2026-05-12 Alexander Conzelmann , Albert Catalan-Tatjer , Shiwei Liu

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To…

Computation and Language · Computer Science 2026-05-01 Yonggan Fu , Lexington Whalen , Zhifan Ye , Xin Dong , Shizhe Diao , Jingyu Liu , Chengyue Wu , Hao Zhang , Enze Xie , Song Han , Maksim Khadkevich , Jan Kautz , Yingyan Celine Lin , Pavlo Molchanov

A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether…

Computation and Language · Computer Science 2026-05-11 Raghavv Goel , Risheek Garrepalli , Sudhanshu Agrawal , Chris Lott , Mingu Lee , Fatih Porikli

Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially,…

Machine Learning · Computer Science 2026-01-09 Gen Li , Changxiao Cai

Anchored Diffusion Language Model

Diffusion Language Models (DLMs) promise parallel generation and bidirectional context, yet they underperform autoregressive (AR) models in both likelihood modeling and generated text quality. We identify that this performance gap arises…

Computation and Language · Computer Science 2025-05-27 Litu Rout , Constantine Caramanis , Sanjay Shakkottai

Introspective Diffusion Language Models

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do…

Artificial Intelligence · Computer Science 2026-04-14 Yifan Yu , Yuqing Jian , Junxiong Wang , Zhongzhu Zhou , Donglin Zhuang , Xinyu Fang , Sri Yanamandra , Xiaoxia Wu , Qingyang Wu , Shuaiwen Leon Song , Tri Dao , Ben Athiwaratkun , James Zou , Fan Lai , Chenfeng Xu

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Diffusion-based decoding has recently emerged as an appealing alternative to autoregressive (AR) generation, offering the potential to update multiple tokens in parallel and reduce latency. However, diffusion vision language models (dVLMs)…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Lunbin Zeng , Jingfeng Yao , Bencheng Liao , Hongyuan Tao , Wenyu Liu , Xinggang Wang

Scaling Behavior of Discrete Diffusion Language Models

Modern LLM pre-training consumes vast amounts of compute and training data, making the scaling behavior, or scaling laws, of different models a key distinguishing factor. Discrete diffusion language models (DLMs) have been proposed as an…

Machine Learning · Computer Science 2026-02-17 Dimitri von Rütte , Janis Fluri , Omead Pooladzandi , Bernhard Schölkopf , Thomas Hofmann , Antonio Orvieto

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks, including document processing and code generation. Autoregressive Language Models (ARMs), which generate…

Machine Learning · Computer Science 2025-12-16 Minseo Kim , Coleman Hooper , Aditya Tomar , Chenfeng Xu , Mehrdad Farajtabar , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

While Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm comparable to autoregressive (AR) models, their faithfulness, specifically regarding hallucination, remains largely underexplored. To…

Computation and Language · Computer Science 2026-04-14 Zhengnan Guo , Fei Tan

Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while…

Artificial Intelligence · Computer Science 2026-03-17 Earl J St Sauver

Transfer Learning for Text Diffusion Models

In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be…

Computation and Language · Computer Science 2024-01-31 Kehang Han , Kathleen Kenealy , Aditya Barua , Noah Fiedel , Noah Constant

Are Diffusion Language Models Good Database Analysts?

Recent advancements in large language models (LLMs) have significantly improved Natural Language to SQL (NL2SQL) tasks, yet most NL2SQL systems continue to rely on the autoregressive (AR) paradigm. The highly structured nature of SQL makes…

Databases · Computer Science 2026-05-28 Peixian Ma , Xialie Zhuang , Jiantao Tan , Changlun Li , Ruirui Chen , Chengwei Qin

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

We propose TraceRL, a trajectory-aware reinforcement learning framework for diffusion language models (DLMs) that incorporates preferred inference trajectory into post-training, and is applicable across different architectures. Equipped…

Computation and Language · Computer Science 2025-09-09 Yinjie Wang , Ling Yang , Bowen Li , Ye Tian , Ke Shen , Mengdi Wang