Related papers: dParallel: Learnable Parallel Decoding for dLLMs

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

DMax: Aggressive Parallel Decoding for dLLMs

We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked…

Machine Learning · Computer Science 2026-05-18 Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang

d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation

Diffusion large language models (dLLMs) offer capabilities beyond those of autoregressive (AR) LLMs, such as parallel decoding and random-order generation. However, realizing these benefits in practice is non-trivial, as dLLMs inherently…

Machine Learning · Computer Science 2026-01-30 Yu-Yang Qian , Junda Su , Lanxiang Hu , Peiyuan Zhang , Zhijie Deng , Peng Zhao , Hao Zhang

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit

Diffusion large language models (dLLMs) generate text through iterative denoising. In commonly adopted parallel decoding schemes, each step confirms only high-confidence positions while remasking the others. By analyzing dLLM denoising…

Computation and Language · Computer Science 2026-05-27 Kangyu Wang , Zhiyun Jiang , Haibo Feng , Weijia Zhao , Lin Liu , Jianguo Li , Zhenzhong Lan , Weiyao Lin

Divide and Conquer: Accelerating Diffusion-Based Large Language Models via Adaptive Parallel Decoding

Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that…

Computation and Language · Computer Science 2026-03-02 Xiangzhong Luo , Yilin An , Zhicheng Yu , Weichen Liu , Xu Yang

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

Masked diffusion language models (MDLMs) offer the potential for parallel token generation, but most open-source MDLMs decode fewer than 5 tokens per model forward pass even with sophisticated sampling strategies, limiting their parallel…

Machine Learning · Computer Science 2026-02-09 Shirui Chen , Jiantao Jiao , Lillian J. Ratliff , Banghua Zhu

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

In this work, we propose Dimple, the first Discrete Diffusion Multimodal Large Language Model (DMLLM). We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Runpeng Yu , Xinyin Ma , Xinchao Wang

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive generation by enabling parallel token prediction. However, practical dLLM decoding still suffers from high inference latency, which limits…

Computation and Language · Computer Science 2026-04-22 Zhenbang Du , Kejing Xia , Xinrui Zhong , Yonggan Fu , Nicolai Oswald , Binfei Ji , Brucek Khailany , Pavlo Molchanov , Yingyan Lin

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the…

Machine Learning · Computer Science 2025-10-07 Wonjun Kang , Kevin Galim , Seunghyuk Oh , Minjae Lee , Yuchen Zeng , Shuibai Zhang , Coleman Hooper , Yuezhou Hu , Hyung Il Koo , Nam Ik Cho , Kangwook Lee

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized…

Computation and Language · Computer Science 2026-04-06 Liran Ringel , Ameen Ali , Yaniv Romano

CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models

Autoregressive large language models achieve strong results on many benchmarks, but decoding remains fundamentally latency-limited by sequential dependence on previously generated tokens. Diffusion language models (DLMs) promise parallel…

Computation and Language · Computer Science 2026-01-06 Yihao Liang , Ze Wang , Hao Chen , Ximeng Sun , Jialian Wu , Xiaodong Yu , Jiang Liu , Emad Barsoum , Zicheng Liu , Niraj K. Jha

AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism

Large language models (LLMs) are increasingly used for long-content generation (e.g., long Chain-of-Thought reasoning) where decoding efficiency becomes a critical bottleneck: Autoregressive decoding is inherently limited by its sequential…

Computation and Language · Computer Science 2025-06-05 Zhepei Wei , Wei-Lin Chen , Xinyu Zhu , Yu Meng

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly…

Computation and Language · Computer Science 2024-04-19 Pengfei Wu , Jiahao Liu , Zhuocheng Gong , Qifan Wang , Jinpeng Li , Jingang Wang , Xunliang Cai , Dongyan Zhao

dVoting: Fast Voting for dLLMs

Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary…

Computation and Language · Computer Science 2026-02-13 Sicheng Feng , Zigeng Chen , Xinyin Ma , Gongfan Fang , Xinchao Wang

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source…

Machine Learning · Computer Science 2025-08-14 Xu Wang , Chenkai Xu , Yijie Jin , Jiachun Jin , Hao Zhang , Zhijie Deng

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

Diffusion large language models (dLLMs) offer a promising paradigm for parallel text generation, but in practice they face an accuracy-parallelism trade-off, where increasing tokens per forward (TPF) often degrades generation quality.…

Computation and Language · Computer Science 2026-05-12 Haoyang Zhou , Li Kong , Shijie Ren , Xiting Wang , Shuang Liang , Guowei Wang , Zhenxuan Pan

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

The auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. While recent research has investigated various speculative decoding techniques for multi-token generation, these…

Machine Learning · Computer Science 2025-10-01 Hao Mark Chen , Wayne Luk , Ka Fai Cedric Yiu , Rui Li , Konstantin Mishchenko , Stylianos I. Venieris , Hongxiang Fan

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models

Diffusion large language models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) models, offering advantages such as accelerated parallel decoding and bidirectional context modeling. However, the vanilla…

Computation and Language · Computer Science 2025-10-07 Runchu Tian , Junxia Cui , Xueqiang Xu , Feng Yao , Jingbo Shang

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

Masked Diffusion Language Models (MDLMs) enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising process remains computationally expensive…

Computation and Language · Computer Science 2026-03-10 Younjoo Lee , Junghoo Lee , Seungkyun Dan , Jaiyoung Park , Jung Ho Ahn

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

Diffusion Language Models (dLLMs) have garnered significant attention for their potential in highly parallel processing. The parallel capabilities of existing dLLMs stem from the assumption of conditional independence at high confidence…

Machine Learning · Computer Science 2026-05-13 Haohui Zhang , Zhiye Wang , Xiaoying Gan , Xinbing Wang , Bo Jiang