Related papers: Likelihood-Based Diffusion Language Models

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous…

Computation and Language · Computer Science 2026-05-19 Zhihan Yang , Wei Guo , Shuibai Zhang , Subham Sekhar Sahoo , Yongxin Chen , Arash Vahdat , Morteza Mardani , John Thickstun

Scaling Beyond Masked Diffusion Language Models

Diffusion language models are a promising alternative to autoregressive models due to their potential for faster generation. Among discrete diffusion approaches, Masked diffusion currently dominates, largely driven by strong perplexity on…

Machine Learning · Computer Science 2026-02-17 Subham Sekhar Sahoo , Jean-Marie Lemercier , Zhihan Yang , Justin Deschenaux , Jingyu Liu , John Thickstun , Ante Jukic

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to…

Computation and Language · Computer Science 2025-06-03 Shansan Gong , Shivam Agarwal , Yizhe Zhang , Jiacheng Ye , Lin Zheng , Mukai Li , Chenxin An , Peilin Zhao , Wei Bi , Jiawei Han , Hao Peng , Lingpeng Kong

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language…

Computation and Language · Computer Science 2025-02-25 Jiasheng Ye , Zaixiang Zheng , Yu Bao , Lihua Qian , Quanquan Gu

Large Language Diffusion Models

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised…

Computation and Language · Computer Science 2025-10-21 Shen Nie , Fengqi Zhu , Zebin You , Xiaolu Zhang , Jingyang Ou , Jun Hu , Jun Zhou , Yankai Lin , Ji-Rong Wen , Chongxuan Li

Scaling Behavior of Discrete Diffusion Language Models

Modern LLM pre-training consumes vast amounts of compute and training data, making the scaling behavior, or scaling laws, of different models a key distinguishing factor. Discrete diffusion language models (DLMs) have been proposed as an…

Machine Learning · Computer Science 2026-02-17 Dimitri von Rütte , Janis Fluri , Omead Pooladzandi , Bernhard Schölkopf , Thomas Hofmann , Antonio Orvieto

Continuous Diffusion Model for Language Modeling

Diffusion models have emerged as a promising alternative to autoregressive models in modeling discrete categorical data. However, diffusion models that directly work on discrete data space fail to fully exploit the power of iterative…

Machine Learning · Computer Science 2025-10-24 Jaehyeong Jo , Sung Ju Hwang

Dream 7B: Diffusion Large Language Models

We introduce Dream 7B, the most powerful open diffusion large language model to date. Unlike autoregressive (AR) models that generate tokens sequentially, Dream 7B employs discrete diffusion modeling to refine sequences in parallel through…

Computation and Language · Computer Science 2025-08-22 Jiacheng Ye , Zhihui Xie , Lin Zheng , Jiahui Gao , Zirui Wu , Xin Jiang , Zhenguo Li , Lingpeng Kong

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Large language model (LLM)-based embedding models, benefiting from large scale pre-training and post-training, have begun to surpass BERT and T5-based models on general-purpose text embedding tasks such as document retrieval. However, a…

Computation and Language · Computer Science 2025-05-22 Siyue Zhang , Yilun Zhao , Liyuan Geng , Arman Cohan , Anh Tuan Luu , Chen Zhao

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

While Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm comparable to autoregressive (AR) models, their faithfulness, specifically regarding hallucination, remains largely underexplored. To…

Computation and Language · Computer Science 2026-04-14 Zhengnan Guo , Fei Tan

Diffusion Beats Autoregressive in Data-Constrained Settings

Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages…

Machine Learning · Computer Science 2025-10-28 Mihir Prabhudesai , Mengning Wu , Amir Zadeh , Katerina Fragkiadaki , Deepak Pathak

Towards Latent Diffusion Suitable For Text

Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of…

Computation and Language · Computer Science 2026-01-26 Nesta Midavaine , Christian A. Naesseth , Grigory Bartosh

Simple and Effective Masked Diffusion Language Models

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete…

Computation and Language · Computer Science 2024-11-12 Subham Sekhar Sahoo , Marianne Arriola , Yair Schiff , Aaron Gokaslan , Edgar Marroquin , Justin T Chiu , Alexander Rush , Volodymyr Kuleshov

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work,…

Machine Learning · Computer Science 2025-05-20 Marianne Arriola , Aaron Gokaslan , Justin T. Chiu , Zhihan Yang , Zhixuan Qi , Jiaqi Han , Subham Sekhar Sahoo , Volodymyr Kuleshov

Generalized Interpolating Discrete Diffusion

While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative…

Computation and Language · Computer Science 2025-06-10 Dimitri von Rütte , Janis Fluri , Yuhui Ding , Antonio Orvieto , Bernhard Schölkopf , Thomas Hofmann

Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning

In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic evaluation and exploration of a diverse spectrum of…

Computation and Language · Computer Science 2025-11-03 Chenyang Shao , Sijian Ren , Fengli Xu , Yong Li

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have…

Computation and Language · Computer Science 2023-04-11 Jiaao Chen , Aston Zhang , Mu Li , Alex Smola , Diyi Yang

A Survey of Diffusion Models in Natural Language Processing

This survey paper provides a comprehensive review of the use of diffusion models in natural language processing (NLP). Diffusion models are a class of mathematical models that aim to capture the diffusion of information or signals across a…

Computation and Language · Computer Science 2023-06-16 Hao Zou , Zae Myung Kim , Dongyeop Kang

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR)…

Computation and Language · Computer Science 2025-06-04 Siyan Zhao , Devaansh Gupta , Qinqing Zheng , Aditya Grover