Related papers: Dependency-Guided Parallel Decoding in Discrete Di…

Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

Parallel decoding for diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We…

Machine Learning · Computer Science 2026-03-16 Bumjun Kim , Dongjae Jeon , Moongyu Jeon , Albert No

Cluster-Level Attention-Guided Parallel Decoding for Masked Diffusion Language Models

Masked diffusion language models (MDLMs) enable parallel decoding by predicting all masked positions at each denoising step, yet existing training-free samplers usually decide which positions to commit at token-level granularity. We revisit…

Machine Learning · Computer Science 2026-05-29 Heqiang Qi , Wei Huang , Mingyuan Bai , Xiangming Meng

DAWN: Dependency-Aware Fast Inference for Diffusion LLMs

Diffusion large language models (dLLMs) have shown advantages in text generation, particularly due to their inherent ability for parallel decoding. However, constrained by the quality--speed trade-off, existing inference solutions adopt…

Computation and Language · Computer Science 2026-02-09 Lizhuo Luo , Zhuoran Shi , Jiajun Luo , Zhi Wang , Shen Ren , Wenya Wang , Tianwei Zhang

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

Masked diffusion language models (MDLMs) offer the potential for parallel token generation, but most open-source MDLMs decode fewer than 5 tokens per model forward pass even with sophisticated sampling strategies, limiting their parallel…

Machine Learning · Computer Science 2026-02-09 Shirui Chen , Jiantao Jiao , Lillian J. Ratliff , Banghua Zhu

dParallel: Learnable Parallel Decoding for dLLMs

Diffusion large language models (dLLMs) have recently drawn considerable attention within the research community as a promising alternative to autoregressive generation, offering parallel token prediction and lower inference latency. Yet,…

Computation and Language · Computer Science 2025-10-01 Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit

Diffusion large language models (dLLMs) generate text through iterative denoising. In commonly adopted parallel decoding schemes, each step confirms only high-confidence positions while remasking the others. By analyzing dLLM denoising…

Computation and Language · Computer Science 2026-05-27 Kangyu Wang , Zhiyun Jiang , Haibo Feng , Weijia Zhao , Lin Liu , Jianguo Li , Zhenzhong Lan , Weiyao Lin

Divide and Conquer: Accelerating Diffusion-Based Large Language Models via Adaptive Parallel Decoding

Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that…

Computation and Language · Computer Science 2026-03-02 Xiangzhong Luo , Yilin An , Zhicheng Yu , Weichen Liu , Xu Yang

Adaptation to Intrinsic Dependence in Diffusion Language Models

Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a rigid left-to-right order. Despite growing empirical success, the theoretical…

Machine Learning · Computer Science 2026-02-24 Yunxiao Zhao , Changxiao Cai

Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

Diffusion-based language models (dLLMs) have emerged as a promising alternative to autoregressive language models, offering the potential for parallel token generation and bidirectional context modeling. However, harnessing this flexibility…

Computation and Language · Computer Science 2026-05-28 Jiyeon Kim , Sungik Choi , Yongrae Jo , Moontae Lee , Minjoon Seo

FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion

Diffusion language models offer parallel token generation and inherent bidirectionality, promising more efficient and powerful sequence modeling compared to autoregressive approaches. However, state-of-the-art diffusion models (e.g., Dream…

Computation and Language · Computer Science 2025-10-10 Zhanqiu Hu , Jian Meng , Yash Akhauri , Mohamed S. Abdelfattah , Jae-sun Seo , Zhiru Zhang , Udit Gupta

Relaxing Positional Alignment in Masked Diffusion Language Models

Masked diffusion language models (MDLMs) have emerged as a promising alternative to dominant autoregressive approaches. Although they achieve competitive performance on several tasks, a substantial gap remains in open-ended text generation.…

Computation and Language · Computer Science 2026-02-02 Mengyu Ye , Ryosuke Takahashi , Keito Kudo , Jun Suzuki

PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

Diffusion large language models (dLLMs) generate text by iteratively denoising masked token sequences. Although dLLMs can predict all masked positions in parallel within each step, the large number of denoising iterations still makes…

Computation and Language · Computer Science 2026-05-18 Shengyin Sun , Yiming Li , Renxi Liu , Xinqi Li , Hui-Ling Zhen , Weizhe Lin , Chen Chen , Xianzhi Yu , Mingxuan Yuan , Chen Ma

Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding

Discrete diffusion language models improve generation efficiency through parallel token prediction, but standard $X_0$ prediction methods introduce factorization errors by approximating the clean token posterior with independent token-wise…

Computation and Language · Computer Science 2026-05-15 Xun Fang , Yunchen Li , Hang Yuan , Zhou Yu

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

This paper shows how diffusion language models (DLMs) can be used as effective and efficient retrievers. Existing DLM-based retrievers (e.g., DiffEmbed) follow BERT-style encoding, representing each query or passage as a single mean-pooled…

Information Retrieval · Computer Science 2026-05-29 Shuai Wang , Yu Yin , Shengyao Zhuang , Bevan Koopman , Guido Zuccon

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

Masked Diffusion Language Models (MDLMs) enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising process remains computationally expensive…

Computation and Language · Computer Science 2026-03-10 Younjoo Lee , Junghoo Lee , Seungkyun Dan , Jaiyoung Park , Jung Ho Ahn

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and…

Computation and Language · Computer Science 2026-05-26 Omer Luxembourg , Haim Permuter , Eliya Nachmani

Enabling Approximate Joint Sampling in Diffusion LMs

In autoregressive language models, each token is sampled by conditioning on all the past tokens; the overall string has thus been sampled from the correct underlying joint distribution represented by the model. In contrast, masked diffusion…

Computation and Language · Computer Science 2026-02-03 Parikshit Bansal , Sujay Sanghavi

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

Diffusion Language Models (dLLMs) have garnered significant attention for their potential in highly parallel processing. The parallel capabilities of existing dLLMs stem from the assumption of conditional independence at high confidence…

Machine Learning · Computer Science 2026-05-13 Haohui Zhang , Zhiye Wang , Xiaoying Gan , Xinbing Wang , Bo Jiang

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Runpeng Yu , Xinyin Ma , Xinchao Wang