English
Related papers

Related papers: What Makes Diffusion Language Models Super Data Le…

200 papers

Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later…

Machine Learning · Computer Science 2025-11-06 Jinjie Ni , Qian Liu , Longxu Dou , Chao Du , Zili Wang , Hang Yan , Tianyu Pang , Michael Qizhe Shieh

Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages…

Machine Learning · Computer Science 2025-10-28 Mihir Prabhudesai , Mengning Wu , Amir Zadeh , Katerina Fragkiadaki , Deepak Pathak

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each…

Machine Learning · Computer Science 2025-06-10 Guhao Feng , Yihan Geng , Jian Guan , Wei Wu , Liwei Wang , Di He

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language…

Computation and Language · Computer Science 2025-02-25 Jiasheng Ye , Zaixiang Zheng , Yu Bao , Lihua Qian , Quanquan Gu

We present a masked diffusion language modeling framework for data-efficient training for the BabyLM 2025 Challenge. Our approach applies diffusion training objectives to language modeling under strict data constraints, incorporating…

Computation and Language · Computer Science 2025-09-08 Despoina Kosmopoulou , Efthymios Georgiou , Vaggelis Dorovatas , Georgios Paraskevopoulos , Alexandros Potamianos

Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source…

Computation and Language · Computer Science 2025-11-11 Han Peng , Peiyu Liu , Zican Dong , Daixuan Cheng , Junyi Li , Yiru Tang , Shuo Wang , Wayne Xin Zhao

Diffusion models offer appealing properties for language generation, such as parallel decoding and iterative refinement, but the discrete and highly structured nature of text challenges the direct application of diffusion principles. In…

Computation and Language · Computer Science 2025-12-30 Ziqi Jin , Bin Wang , Xiang Lin , Lidong Bing , Aixin Sun

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling…

Machine Learning · Computer Science 2026-05-14 Chunsan Hong , Sanghyun Lee , Chieh-Hsin Lai , Satoshi Hayakawa , Yuhta Takida , Yuki Mitsufuji , Seungryong Kim , Jong Chul Ye

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete…

Discrete diffusion models offer global context awareness and flexible parallel generation. However, uniform random noise schedulers in standard DLLM training overlook the highly non-uniform information density inherent in real-world…

Machine Learning · Computer Science 2026-03-18 Linrui Ma , Yufei Cui , Kai Han , Yunhe Wang

Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect…

Machine Learning · Computer Science 2024-11-12 Subham Sekhar Sahoo , Aaron Gokaslan , Chris De Sa , Volodymyr Kuleshov

Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference…

Machine Learning · Computer Science 2026-02-04 Andre He , Sean Welleck , Daniel Fried

Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While…

Computation and Language · Computer Science 2023-08-24 Kushal Tirumala , Daniel Simig , Armen Aghajanyan , Ari S. Morcos

Transformer-based pretrained language models (PLMs) have achieved great success in modern NLP. An important advantage of PLMs is good out-of-distribution (OOD) robustness. Recently, diffusion models have attracted a lot of work to apply…

Computation and Language · Computer Science 2023-07-27 Huazheng Wang , Daixuan Cheng , Haifeng Sun , Jingyu Wang , Qi Qi , Jianxin Liao , Jing Wang , Cong Liu

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that…

Machine Learning · Computer Science 2026-05-26 Sanghyun Lee , Chunsan Hong , Seungryong Kim , Jonghyun Lee , Jongho Park , Dongmin Park

Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have…

Computation and Language · Computer Science 2023-11-08 Justin Lovelace , Varsha Kishore , Chao Wan , Eliot Shekhtman , Kilian Q. Weinberger

Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One critical design aspect of dLLMs is…

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have…

Computation and Language · Computer Science 2023-04-11 Jiaao Chen , Aston Zhang , Mu Li , Alex Smola , Diyi Yang

Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks, including document processing and code generation. Autoregressive Language Models (ARMs), which generate…

‹ Prev 1 2 3 10 Next ›