English
Related papers

Related papers: Layer Collapse in Diffusion Language Models

200 papers

Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether…

Computation and Language · Computer Science 2026-05-11 Raghavv Goel , Risheek Garrepalli , Sudhanshu Agrawal , Chris Lott , Mingu Lee , Fatih Porikli

Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later…

Machine Learning · Computer Science 2025-11-06 Jinjie Ni , Qian Liu , Longxu Dou , Chao Du , Zili Wang , Hang Yan , Tianyu Pang , Michael Qizhe Shieh

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised…

Computation and Language · Computer Science 2025-10-21 Shen Nie , Fengqi Zhu , Zebin You , Xiaolu Zhang , Jingyang Ou , Jun Hu , Jun Zhou , Yankai Lin , Ji-Rong Wen , Chongxuan Li

Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (AR) language models when trained from scratch. To…

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to…

Computation and Language · Computer Science 2025-06-03 Shansan Gong , Shivam Agarwal , Yizhe Zhang , Jiacheng Ye , Lin Zheng , Mukai Li , Chenxin An , Peilin Zhao , Wei Bi , Jiawei Han , Hao Peng , Lingpeng Kong

While Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm comparable to autoregressive (AR) models, their faithfulness, specifically regarding hallucination, remains largely underexplored. To…

Computation and Language · Computer Science 2026-04-14 Zhengnan Guo , Fei Tan

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do…

Masked Diffusion Language Models (DLMs) have recently emerged as a promising alternative to traditional Autoregressive Models (ARMs). DLMs employ transformer encoders with bidirectional attention, enabling parallel token generation while…

Computation and Language · Computer Science 2025-12-11 Maximo Eduardo Rulli , Simone Petruzzi , Edoardo Michielon , Fabrizio Silvestri , Simone Scardapane , Alessio Devoto

Modern LLM pre-training consumes vast amounts of compute and training data, making the scaling behavior, or scaling laws, of different models a key distinguishing factor. Discrete diffusion language models (DLMs) have been proposed as an…

Machine Learning · Computer Science 2026-02-17 Dimitri von Rütte , Janis Fluri , Omead Pooladzandi , Bernhard Schölkopf , Thomas Hofmann , Antonio Orvieto

Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their…

Computation and Language · Computer Science 2026-04-07 Jingyi Yang , Yuxian Jiang , Xuhao Hu , Shuang Cheng , Biqing Qi , Jing Shao

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous…

Computation and Language · Computer Science 2026-05-19 Zhihan Yang , Wei Guo , Shuibai Zhang , Subham Sekhar Sahoo , Yongxin Chen , Arash Vahdat , Morteza Mardani , John Thickstun

Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source…

Computation and Language · Computer Science 2025-11-11 Han Peng , Peiyu Liu , Zican Dong , Daixuan Cheng , Junyi Li , Yiru Tang , Shuo Wang , Wayne Xin Zhao

Recent studies have shown that diffusion language models achieve remarkable data efficiency under limited-data constraints, yet the underlying mechanisms remain unclear. In this work, we perform extensive ablation experiments to disentangle…

Computation and Language · Computer Science 2025-10-07 Zitian Gao , Haoming Luo , Lynx Chen , Jason Klein Liu , Ran Tao , Joey Zhou , Bryan Dai

Diffusion Language models (DLMs) are a promising avenue for text generation due to their practical properties on tractable controllable generation. They also have the advantage of not having to predict text autoregressively. However,…

Machine Learning · Computer Science 2024-02-13 Sofia Maria Lo Cicero Vaina , Nikita Balagansky , Daniil Gavrilov

Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation…

Computation and Language · Computer Science 2026-05-08 Hongcan Guo , Qinyu Zhao , Yian Zhao , Shen Nie , Rui Zhu , Qiushan Guo , Feng Wang , Tao Yang , Hengshuang Zhao , Guoqiang Wei , Yan Zeng

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that…

Machine Learning · Computer Science 2026-05-26 Sanghyun Lee , Chunsan Hong , Seungryong Kim , Jonghyun Lee , Jongho Park , Dongmin Park

Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their implications for agentic multi-step decision making remain underexplored. We…

This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale…

Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while…

Artificial Intelligence · Computer Science 2026-03-17 Earl J St Sauver
‹ Prev 1 2 3 10 Next ›