English
Related papers

Related papers: BlockBatch: Multi-Scale Consensus Decoding for Eff…

200 papers

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast…

Computation and Language · Computer Science 2026-05-29 Jian Chen , Yesheng Liang , Zhijian Liu

Autoregressive Models (ARMs) have long dominated the landscape of Large Language Models. Recently, a new paradigm has emerged in the form of diffusion-based Large Language Models (dLLMs), which generate text by iteratively denoising masked…

Machine Learning · Computer Science 2025-06-10 Zhiyuan Liu , Yicun Yang , Yaojie Zhang , Junjie Chen , Chang Zou , Qingyuan Wei , Shaobo Wang , Linfeng Zhang

Diffusion-based large language models (Diffusion LLMs) have shown promise for non-autoregressive text generation with parallel decoding capabilities. However, the practical inference speed of open-sourced Diffusion LLMs often lags behind…

Computation and Language · Computer Science 2025-07-04 Chengyue Wu , Hao Zhang , Shuchen Xue , Zhijian Liu , Shizhe Diao , Ligeng Zhu , Ping Luo , Song Han , Enze Xie

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive language generation due to their potential for parallel decoding and global refinement of the entire sequence. To unlock this potential, DLM…

Machine Learning · Computer Science 2026-04-20 Xiang Xia , Wuyang Zhang , Jiazheng Liu , Cheng Yan , Yanyong Zhang

Diffusion-based large language models (dLLMs) are gaining attention for their inherent capacity for parallel decoding, offering a compelling alternative to autoregressive LLMs. Among various decoding strategies, block-wise…

Machine Learning · Computer Science 2026-03-03 Guanxi Lu , Hao Mark Chen , Yuto Karashima , Zhican Wang , Daichi Fujiki , Hongxiang Fan

Autoregressive (AR) large language models (LLMs) have achieved remarkable performance across a wide range of natural language tasks, yet their inherent sequential decoding limits inference efficiency. In this work, we propose Fast-dLLM v2,…

Computation and Language · Computer Science 2025-10-01 Chengyue Wu , Hao Zhang , Shuchen Xue , Shizhe Diao , Yonggan Fu , Zhijian Liu , Pavlo Molchanov , Ping Luo , Song Han , Enze Xie

Diffusion-based large language models (dLLMs), despite their promising performance, still suffer from inferior inference efficiency. This is because dLLMs rely on bidirectional attention and cannot directly benefit from the standard…

Computation and Language · Computer Science 2026-02-17 Yuchu Jiang , Yue Cai , Xiangzhong Luo , Jiale Fu , Jiarui Wang , Chonghan Liu , Xu Yang

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source…

Machine Learning · Computer Science 2025-08-14 Xu Wang , Chenkai Xu , Yijie Jin , Jiachun Jin , Hao Zhang , Zhijie Deng

Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a…

Machine Learning · Computer Science 2026-01-29 Rui Pan , Zhuofu Chen , Hongyi Liu , Arvind Krishnamurthy , Ravi Netravali

Diffusion language models offer parallel token generation and inherent bidirectionality, promising more efficient and powerful sequence modeling compared to autoregressive approaches. However, state-of-the-art diffusion models (e.g., Dream…

Computation and Language · Computer Science 2025-10-10 Zhanqiu Hu , Jian Meng , Yash Akhauri , Mohamed S. Abdelfattah , Jae-sun Seo , Zhiru Zhang , Udit Gupta

Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. In this work, we identify a key inefficiency in DLLM decoding: while computation is…

Machine Learning · Computer Science 2026-02-02 Kaihua Liang , Xin Tan , An Zhong , Hong Xu , Marco Canini

Diffusion language models (DLMs) generate text through iterative denoising, but inference requires full-sequence attention at every iteration, resulting in substantial redundant computation on masked tokens. Block-wise diffusion can reduce…

Machine Learning · Computer Science 2026-02-03 Fengrui Zuo , Zhiwei Ke , Yiming Liu , Wenqi Lou , Chao Wang , Xuehai Zhou

Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive quadratic computational complexity and memory overhead during inference. Current caching techniques accelerate…

Computation and Language · Computer Science 2025-11-06 Yuerong Song , Xiaoran Liu , Ruixiao Li , Zhigeng Liu , Zengfeng Huang , Qipeng Guo , Ziwei He , Xipeng Qiu

Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the…

While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the…

While autoregressive (AR) Vision-Language-Action (VLA) models have demonstrated formidable reasoning capabilities in robotic tasks, their sequential decoding process often incurs high inference latency and may amplify error accumulation…

Robotics · Computer Science 2026-05-14 Ruiheng Wang , Shuanghao Bai , Haoran Zhang , Badong Chen , Xiangyu Xu

Diffusion Large Language Models (dLLMs) offer a compelling paradigm for natural language generation, leveraging parallel decoding and bidirectional attention to achieve superior global coherence compared to autoregressive models. While…

Machine Learning · Computer Science 2026-01-28 Zhongyu Xiao , Zhiwei Hao , Jianyuan Guo , Yong Luo , Jia Liu , Jie Xu , Han Hu

Diffusion Language Models (DLMs) have been seen as a promising competitor for autoregressive language models. However, diffusion language models have long been constrained by slow inference. A core challenge is that their non-autoregressive…

Computation and Language · Computer Science 2025-05-22 Xinyin Ma , Runpeng Yu , Gongfan Fang , Xinchao Wang

Diffusion-based large language models (dLLMs) have recently gained significant attention for their exceptional performance and inherent potential for parallel decoding. Existing frameworks further enhance its inference efficiency by…

Computation and Language · Computer Science 2025-12-01 Linye Wei , Wenjue Chen , Pingzhi Tang , Xiaotian Guo , Le Ye , Runsheng Wang , Meng Li

Diffusion large language models (dLLMs) have shown advantages in text generation, particularly due to their inherent ability for parallel decoding. However, constrained by the quality--speed trade-off, existing inference solutions adopt…

Computation and Language · Computer Science 2026-02-09 Lizhuo Luo , Zhuoran Shi , Jiajun Luo , Zhi Wang , Shen Ren , Wenya Wang , Tianwei Zhang
‹ Prev 1 2 3 10 Next ›