相关论文: Diffusion Language Models Know the Answer Before D…

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and…

计算机视觉与模式识别 · 计算机科学 2025-05-27 Runpeng Yu , Xinyin Ma , Xinchao Wang

Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

Diffusion-based language models (dLLMs) have emerged as a promising alternative to autoregressive language models, offering the potential for parallel token generation and bidirectional context modeling. However, harnessing this flexibility…

计算与语言 · 计算机科学 2026-05-28 Jiyeon Kim , Sungik Choi , Yongrae Jo , Moontae Lee , Minjoon Seo

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source…

机器学习 · 计算机科学 2025-08-14 Xu Wang , Chenkai Xu , Yijie Jin , Jiachun Jin , Hao Zhang , Zhijie Deng

A Survey on Diffusion Language Models

Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent…

计算与语言 · 计算机科学 2025-12-08 Tianyi Li , Mingda Chen , Bowei Guo , Zhiqiang Shen

FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion

Diffusion language models offer parallel token generation and inherent bidirectionality, promising more efficient and powerful sequence modeling compared to autoregressive approaches. However, state-of-the-art diffusion models (e.g., Dream…

计算与语言 · 计算机科学 2025-10-10 Zhanqiu Hu , Jian Meng , Yash Akhauri , Mohamed S. Abdelfattah , Jae-sun Seo , Zhiru Zhang , Udit Gupta

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

Autoregressive (AR) models remain the standard for natural language generation but still suffer from high latency due to strictly sequential decoding. Recent diffusion-inspired approaches, such as LlaDA and Dream, mitigate this by…

计算与语言 · 计算机科学 2025-10-16 Qinglin Zhu , Yizhen Yao , Runcong Zhao , Yanzheng Xiang , Amrutha Saseendran , Chen Jin , Philip Teare , Bin Liang , Yulan He , Lin Gui

Corrective Diffusion Language Models

While Diffusion Language Models (DLMs) are theoretically well-suited for iterative refinement due to their non-causal structure, they often fail to reliably revise incorrect tokens in practice. The key challenge lies in the model's…

机器学习 · 计算机科学 2026-01-30 Shuibai Zhang , Fred Zhangzhi Peng , Yiheng Zhang , Jin Pan , Grigorios G. Chrysos

Diffusion Language Models Are Natively Length-Aware

Unlike autoregressive language models, which terminate variable-length generation upon predicting an End-of-Sequence (EoS) token, Diffusion Language Models (DLMs) operate over a fixed maximum-length context window for a predetermined number…

计算与语言 · 计算机科学 2026-03-09 Vittorio Rossi , Giacomo Cirò , Davide Beltrame , Luca Gandolfi , Paul Röttger , Dirk Hovy

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

计算与语言 · 计算机科学 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning

In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic evaluation and exploration of a diverse spectrum of…

计算与语言 · 计算机科学 2025-11-03 Chenyang Shao , Sijian Ren , Fengli Xu , Yong Li

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks, including document processing and code generation. Autoregressive Language Models (ARMs), which generate…

机器学习 · 计算机科学 2025-12-16 Minseo Kim , Coleman Hooper , Aditya Tomar , Chenfeng Xu , Mehrdad Farajtabar , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices

Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source…

计算与语言 · 计算机科学 2025-11-11 Han Peng , Peiyu Liu , Zican Dong , Daixuan Cheng , Junyi Li , Yiru Tang , Shuo Wang , Wayne Xin Zhao

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Diffusion-based decoding has recently emerged as an appealing alternative to autoregressive (AR) generation, offering the potential to update multiple tokens in parallel and reduce latency. However, diffusion vision language models (dVLMs)…

计算机视觉与模式识别 · 计算机科学 2026-04-01 Lunbin Zeng , Jingfeng Yao , Bencheng Liao , Hongyuan Tao , Wenyu Liu , Xinggang Wang

Introspective Diffusion Language Models

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do…

人工智能 · 计算机科学 2026-04-14 Yifan Yu , Yuqing Jian , Junxiong Wang , Zhongzhu Zhou , Donglin Zhuang , Xinyu Fang , Sri Yanamandra , Xiaoxia Wu , Qingyang Wu , Shuaiwen Leon Song , Tri Dao , Ben Athiwaratkun , James Zou , Fan Lai , Chenfeng Xu

Divide and Conquer: Accelerating Diffusion-Based Large Language Models via Adaptive Parallel Decoding

Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that…

计算与语言 · 计算机科学 2026-03-02 Xiangzhong Luo , Yilin An , Zhicheng Yu , Weichen Liu , Xu Yang

Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for…

计算与语言 · 计算机科学 2026-04-01 Qingyan Wei , Yaojie Zhang , Zhiyuan Liu , Puyu Zeng , Yuxuan Wang , Biqing Qi , Dongrui Liu , Linfeng Zhang

Diffusion Language Models Generation Can Be Halted Early

Diffusion Language models (DLMs) are a promising avenue for text generation due to their practical properties on tractable controllable generation. They also have the advantage of not having to predict text autoregressively. However,…

机器学习 · 计算机科学 2024-02-13 Sofia Maria Lo Cicero Vaina , Nikita Balagansky , Daniil Gavrilov

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

While Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm comparable to autoregressive (AR) models, their faithfulness, specifically regarding hallucination, remains largely underexplored. To…

计算与语言 · 计算机科学 2026-04-14 Zhengnan Guo , Fei Tan

Diffusion Language Models are Provably Optimal Parallel Samplers

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide a rigorous foundation for this advantage by formalizing a model of parallel…

机器学习 · 计算机科学 2026-01-01 Haozhe Jiang , Nika Haghtalab , Lijie Chen

Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still face a severe quality-speed trade-off: accelerating decoding by revealing multiple tokens often causes substantial quality degradation. We…

计算与语言 · 计算机科学 2026-05-19 Fanqin Zeng , Feng Hong , Geng Yu , Huangjie Zheng , Xiaofeng Cao , Ya Zhang , Bo Han , Yanfeng Wang , Jiangchao Yao