English
Related papers

Related papers: Free Draft-and-Verification: Toward Lossless Paral…

200 papers

We present a novel inference scheme, self-speculative decoding, for accelerating Large Language Models (LLMs) without the need for an auxiliary model. This approach is characterized by a two-stage process: drafting and verification. The…

Computation and Language · Computer Science 2025-02-11 Jun Zhang , Jue Wang , Huan Li , Lidan Shou , Ke Chen , Gang Chen , Sharad Mehrotra

Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a…

Machine Learning · Computer Science 2026-01-29 Rui Pan , Zhuofu Chen , Hongyi Liu , Arvind Krishnamurthy , Ravi Netravali

Diffusion-based large language models (Diffusion LLMs) have shown promise for non-autoregressive text generation with parallel decoding capabilities. However, the practical inference speed of open-sourced Diffusion LLMs often lags behind…

Computation and Language · Computer Science 2025-07-04 Chengyue Wu , Hao Zhang , Shuchen Xue , Zhijian Liu , Shizhe Diao , Ligeng Zhu , Ping Luo , Song Han , Enze Xie

Diffusion large language models (dLLMs) generate text through iterative denoising. In commonly adopted parallel decoding schemes, each step confirms only high-confidence positions while remasking the others. By analyzing dLLM denoising…

Computation and Language · Computer Science 2026-05-27 Kangyu Wang , Zhiyun Jiang , Haibo Feng , Weijia Zhao , Lin Liu , Jianguo Li , Zhenzhong Lan , Weiyao Lin

Large language models (LLMs) exhibit exceptional performance across a wide range of tasks; however, their token-by-token autoregressive generation process significantly hinders inference speed. Speculative decoding presents a promising…

Computation and Language · Computer Science 2025-03-04 Kai Lv , Honglin Guo , Qipeng Guo , Xipeng Qiu

Diffusion Large Language Models (dLLMs) have demonstrated promising generative capabilities and are increasingly used to produce formal languages defined by context-free grammars, such as source code and chemical expressions. However, as…

Computation and Language · Computer Science 2026-02-10 Yitong Zhang , Yongmin Li , Yuetong Liu , Jia Li , Xiaoran Jia , Zherui Li , Ge Li

Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still face a severe quality-speed trade-off: accelerating decoding by revealing multiple tokens often causes substantial quality degradation. We…

Computation and Language · Computer Science 2026-05-19 Fanqin Zeng , Feng Hong , Geng Yu , Huangjie Zheng , Xiaofeng Cao , Ya Zhang , Bo Han , Yanfeng Wang , Jiangchao Yao

Diffusion-based Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive models, offering unique advantages through bidirectional attention and parallel generation paradigms. However, the generation results…

Computation and Language · Computer Science 2025-10-07 Yifeng Gao , Ziang Ji , Yuxuan Wang , Biqing Qi , Hanlin Xu , Linfeng Zhang

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive language generation due to their potential for parallel decoding and global refinement of the entire sequence. To unlock this potential, DLM…

Machine Learning · Computer Science 2026-04-20 Xiang Xia , Wuyang Zhang , Jiazheng Liu , Cheng Yan , Yanyong Zhang

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast…

Computation and Language · Computer Science 2026-05-29 Jian Chen , Yesheng Liang , Zhijian Liu

Diffusion large language models (dLLMs) represent a significant advancement in text generation, offering parallel token decoding capabilities. However, existing open-source implementations suffer from quality-speed trade-offs that impede…

Computation and Language · Computer Science 2025-10-09 Fanheng Kong , Jingyuan Zhang , Yahui Liu , Zirui Wu , Yu Tian , Victoria W. , Guorui Zhou

Diffusion language models offer parallel token generation and inherent bidirectionality, promising more efficient and powerful sequence modeling compared to autoregressive approaches. However, state-of-the-art diffusion models (e.g., Dream…

Computation and Language · Computer Science 2025-10-10 Zhanqiu Hu , Jian Meng , Yash Akhauri , Mohamed S. Abdelfattah , Jae-sun Seo , Zhiru Zhang , Udit Gupta

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source…

Machine Learning · Computer Science 2025-08-14 Xu Wang , Chenkai Xu , Yijie Jin , Jiachun Jin , Hao Zhang , Zhijie Deng

Diffusion large language models (dLLMs) have shown advantages in text generation, particularly due to their inherent ability for parallel decoding. However, constrained by the quality--speed trade-off, existing inference solutions adopt…

Computation and Language · Computer Science 2026-02-09 Lizhuo Luo , Zhuoran Shi , Jiajun Luo , Zhi Wang , Shen Ren , Wenya Wang , Tianwei Zhang

Although current Video-LLMs achieve impressive performance in video understanding tasks, their autoregressive decoding efficiency remains constrained by the massive number of video tokens. Visual token pruning can partially ease this…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Quan Kong , Yuhao Shen , Yicheng Ji , Huan Li , Cong Wang

Diffusion large language models (dLLMs) have recently drawn considerable attention within the research community as a promising alternative to autoregressive generation, offering parallel token prediction and lower inference latency. Yet,…

Computation and Language · Computer Science 2025-10-01 Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang

Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that…

Computation and Language · Computer Science 2026-03-02 Xiangzhong Luo , Yilin An , Zhicheng Yu , Weichen Liu , Xu Yang

Diffusion Large Language Models (DLLMs) have emerged as a compelling alternative to Autoregressive models, designed for fast parallel generation. However, existing DLLMs are plagued by a severe quality-speed trade-off, where faster parallel…

Computation and Language · Computer Science 2025-09-29 Feng Hong , Geng Yu , Yushi Ye , Haicheng Huang , Huangjie Zheng , Ya Zhang , Yanfeng Wang , Jiangchao Yao

Discrete diffusion language models improve generation efficiency through parallel token prediction, but standard $X_0$ prediction methods introduce factorization errors by approximating the clean token posterior with independent token-wise…

Computation and Language · Computer Science 2026-05-15 Xun Fang , Yunchen Li , Hang Yuan , Zhou Yu
‹ Prev 1 2 3 10 Next ›