English
Related papers

Related papers: Dependency-Aware Parallel Decoding via Attention f…

200 papers

The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretically allow for parallel token generation, but in…

Computation and Language · Computer Science 2025-11-03 Daniel Israel , Guy Van den Broeck , Aditya Grover

Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized…

Computation and Language · Computer Science 2026-04-06 Liran Ringel , Ameen Ali , Yaniv Romano

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

Diffusion large language models (dLLMs) have shown advantages in text generation, particularly due to their inherent ability for parallel decoding. However, constrained by the quality--speed trade-off, existing inference solutions adopt…

Computation and Language · Computer Science 2026-02-09 Lizhuo Luo , Zhuoran Shi , Jiajun Luo , Zhi Wang , Shen Ren , Wenya Wang , Tianwei Zhang

Masked diffusion language models (MDLMs) enable parallel decoding by predicting all masked positions at each denoising step, yet existing training-free samplers usually decide which positions to commit at token-level granularity. We revisit…

Machine Learning · Computer Science 2026-05-29 Heqiang Qi , Wei Huang , Mingyuan Bai , Xiangming Meng

Diffusion large language models (dLLMs) offer a promising paradigm for parallel text generation, but in practice they face an accuracy-parallelism trade-off, where increasing tokens per forward (TPF) often degrades generation quality.…

Computation and Language · Computer Science 2026-05-12 Haoyang Zhou , Li Kong , Shijie Ren , Xiting Wang , Shuang Liang , Guowei Wang , Zhenxuan Pan

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive language generation due to their potential for parallel decoding and global refinement of the entire sequence. To unlock this potential, DLM…

Machine Learning · Computer Science 2026-04-20 Xiang Xia , Wuyang Zhang , Jiazheng Liu , Cheng Yan , Yanyong Zhang

Diffusion Language Models (dLLMs) have garnered significant attention for their potential in highly parallel processing. The parallel capabilities of existing dLLMs stem from the assumption of conditional independence at high confidence…

Machine Learning · Computer Science 2026-05-13 Haohui Zhang , Zhiye Wang , Xiaoying Gan , Xinbing Wang , Bo Jiang

Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained…

Computation and Language · Computer Science 2026-03-17 Xueyu Zhou , Yangrong Hu , Jian Huang

Auto-regressive models (ARMs) have established a dominant paradigm in language modeling. However, their strictly sequential decoding paradigm imposes fundamental constraints on both inference efficiency and modeling flexibility. To address…

Computation and Language · Computer Science 2026-04-13 Yuyan Zhou , Kai Syun Hou , Weiyu Chen , James Kwok

Diffusion large language models (dLLMs) have recently drawn considerable attention within the research community as a promising alternative to autoregressive generation, offering parallel token prediction and lower inference latency. Yet,…

Computation and Language · Computer Science 2025-10-01 Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang

Large language models (LLMs) are increasingly used for long-content generation (e.g., long Chain-of-Thought reasoning) where decoding efficiency becomes a critical bottleneck: Autoregressive decoding is inherently limited by its sequential…

Computation and Language · Computer Science 2025-06-05 Zhepei Wei , Wei-Lin Chen , Xinyu Zhu , Yu Meng

Diffusion large language models (dLLMs) generate text by iteratively denoising masked token sequences. Although dLLMs can predict all masked positions in parallel within each step, the large number of denoising iterations still makes…

Computation and Language · Computer Science 2026-05-18 Shengyin Sun , Yiming Li , Renxi Liu , Xinqi Li , Hui-Ling Zhen , Weizhe Lin , Chen Chen , Xianzhi Yu , Mingxuan Yuan , Chen Ma

Masked Diffusion Language Models (MDLMs) enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising process remains computationally expensive…

Computation and Language · Computer Science 2026-03-10 Younjoo Lee , Junghoo Lee , Seungkyun Dan , Jaiyoung Park , Jung Ho Ahn

Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that…

Computation and Language · Computer Science 2026-03-02 Xiangzhong Luo , Yilin An , Zhicheng Yu , Weichen Liu , Xu Yang

Diffusion language models (DLMs) offer a structural alternative to autoregressive generation: denoising can update tokens in arbitrary orders or in parallel rather than along a fixed left-to-right chain. In practice, fast DLM decoding…

Machine Learning · Computer Science 2026-05-12 Jeonseong Kim

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators. Existing methods for accelerating LLM decoding…

Machine Learning · Computer Science 2024-02-06 Yichao Fu , Peter Bailis , Ion Stoica , Hao Zhang

Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only…

Computation and Language · Computer Science 2025-08-26 Xinhua Chen , Sitao Huang , Cong Guo , Chiyue Wei , Yintao He , Jianyi Zhang , Hai "Helen" Li , Yiran Chen

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the…

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing.…

Computation and Language · Computer Science 2026-05-19 Michael Hersche , Nicolas Menet , Ronan Tanios , Abbas Rahimi
‹ Prev 1 2 3 10 Next ›