English

Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

Machine Learning 2026-02-04 v3 Artificial Intelligence

Abstract

Diffusion Large Language Models (DLLMs) have emerged as a new paradigm of language modeling beyond autoregressive next-token prediction. Taking advantage of their inherent modeling foundations, DLLMs have the great potential of efficient inference with parallel decoding algorithms, which enable multi-token prediction. However, the high generation quality often requires the number of decoding steps equal to the sequence length, which performs a one-token-per-step decoding, and existing parallel decoding algorithms, which yield suboptimal decoding paths, bring inference speedup at the cost of non-negligible performance degradation. To overcome this challenge, we introduce Free Draft-and-Verification (FreeDave), a novel fast decoding algorithm tailored for DLLMs that achieves lossless parallel decoding without any model modification or extra modules. Specifically, we propose an algorithm of parallel-decoded candidate generation and verification, which is theoretically guaranteed to use the fewest model forward calls to reproduce the same sequence generated by one-token-per-step decoding. By extensive evaluations on math reasoning and code generation benchmarks across different DLLMs, FreeDave is proven to accelerate the inference up to 2.83×2.83\times without performance degradation.

Keywords

Cite

@article{arxiv.2510.00294,
  title  = {Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models},
  author = {Shutong Wu and Jiawei Zhang},
  journal= {arXiv preprint arXiv:2510.00294},
  year   = {2026}
}