Related papers: DOS: Dependency-Oriented Sampler for Masked Diffus…

Adaptation to Intrinsic Dependence in Diffusion Language Models

Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a rigid left-to-right order. Despite growing empirical success, the theoretical…

Machine Learning · Computer Science 2026-02-24 Yunxiao Zhao , Changxiao Cai

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models for language modeling, allowing flexible generation order and parallel generation of multiple tokens. However, this flexibility…

Machine Learning · Computer Science 2026-03-24 Changxiao Cai , Gen Li

No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

Masked diffusion language models (MDLMs) are trained to in-fill positions in randomly masked sequences, in contrast to next-token prediction models. Discussions around MDLMs focus on two benefits: (1) any-order decoding and 2) multi-token…

Machine Learning · Computer Science 2025-10-24 Zachary Horvitz , Raghav Singhal , Hao Zou , Carles Domingo-Enrich , Zhou Yu , Rajesh Ranganath , Kathleen McKeown

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and…

Computation and Language · Computer Science 2026-05-26 Omer Luxembourg , Haim Permuter , Eliya Nachmani

Attention-Based Sampler for Diffusion Language Models

Auto-regressive models (ARMs) have established a dominant paradigm in language modeling. However, their strictly sequential decoding paradigm imposes fundamental constraints on both inference efficiency and modeling flexibility. To address…

Computation and Language · Computer Science 2026-04-13 Yuyan Zhou , Kai Syun Hou , Weiyu Chen , James Kwok

Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step

Masked diffusion language models (MDLMs) have recently emerged as a promising alternative to autoregressive (AR) language models, offering properties such as parallel decoding, flexible generation orders, and the potential for fewer…

Computation and Language · Computer Science 2025-09-30 Jingyi Yang , Guanxu Chen , Xuhao Hu , Jing Shao

DAWN: Dependency-Aware Fast Inference for Diffusion LLMs

Diffusion large language models (dLLMs) have shown advantages in text generation, particularly due to their inherent ability for parallel decoding. However, constrained by the quality--speed trade-off, existing inference solutions adopt…

Computation and Language · Computer Science 2026-02-09 Lizhuo Luo , Zhuoran Shi , Jiajun Luo , Zhi Wang , Shen Ren , Wenya Wang , Tianwei Zhang

Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

Parallel decoding for diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We…

Machine Learning · Computer Science 2026-03-16 Bumjun Kim , Dongjae Jeon , Moongyu Jeon , Albert No

Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes

While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, their computational efficiency and generation flexibility remain constrained by the masking paradigm. In this…

Computation and Language · Computer Science 2026-03-26 Fangyu Ding , Ding Ding , Sijin Chen , Kaibo Wang , Peng Xu , Zijin Feng , Haoli Bai , Kai Han , Youliang Yan , Binhang Yuan , Jiacheng Sun

Diffusion Language Models are Provably Optimal Parallel Samplers

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide a rigorous foundation for this advantage by formalizing a model of parallel…

Machine Learning · Computer Science 2026-01-01 Haozhe Jiang , Nika Haghtalab , Lijie Chen

FOCUS: DLLMs Know How to Tame Their Compute Bound

Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. In this work, we identify a key inefficiency in DLLM decoding: while computation is…

Machine Learning · Computer Science 2026-02-02 Kaihua Liang , Xin Tan , An Zhong , Hong Xu , Marco Canini

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary masking and discrete token assignments, which hinder the…

Computation and Language · Computer Science 2026-01-19 Linhao Zhong , Linyu Wu , Bozhen Fang , Tianjian Feng , Chenchen Jing , Wen Wang , Jiaheng Zhang , Hao Chen , Chunhua Shen

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

Computation and Language · Computer Science 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks, including document processing and code generation. Autoregressive Language Models (ARMs), which generate…

Machine Learning · Computer Science 2025-12-16 Minseo Kim , Coleman Hooper , Aditya Tomar , Chenfeng Xu , Mehrdad Farajtabar , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs

Discrete diffusion models offer global context awareness and flexible parallel generation. However, uniform random noise schedulers in standard DLLM training overlook the highly non-uniform information density inherent in real-world…

Machine Learning · Computer Science 2026-03-18 Linrui Ma , Yufei Cui , Kai Han , Yunhe Wang

DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models

Masked Diffusion Models (MDMs) offer a promising alternative to autoregressive language models by enabling parallel token generation and bidirectional context modeling. However, their inference speed is significantly limited by the…

Machine Learning · Computer Science 2026-04-08 Satyam Goyal , Kushal Patel , Tanush Mittal , Arjun Laxman

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

Masked Diffusion Language Models (MDLMs) enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising process remains computationally expensive…

Computation and Language · Computer Science 2026-03-10 Younjoo Lee , Junghoo Lee , Seungkyun Dan , Jaiyoung Park , Jung Ho Ahn

Consistent Diffusion Language Models

Diffusion language models (DLMs) are an attractive alternative to autoregressive models because they promise sublinear-time, parallel generation, yet practical gains remain elusive as high-quality samples still demand hundreds of refinement…

Machine Learning · Computer Science 2026-05-04 Hasan Amin , Yuan Gao , Yaser Souri , Subhojit Som , Ming Yin , Rajiv Khanna , Xia Song

Divide and Conquer: Accelerating Diffusion-Based Large Language Models via Adaptive Parallel Decoding

Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that…

Computation and Language · Computer Science 2026-03-02 Xiangzhong Luo , Yilin An , Zhicheng Yu , Weichen Liu , Xu Yang

Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs

Among parallel decoding paradigms, diffusion large language models (dLLMs) have emerged as a promising candidate that balances generation quality and throughput. However, their integration with Mixture-of-Experts (MoE) architectures is…

Machine Learning · Computer Science 2026-02-03 Hao Mark Chen , Zhiwen Mo , Royson Lee , Qianzhou Wang , Da Li , Shell Xu Hu , Wayne Luk , Timothy Hospedales , Hongxiang Fan