Related papers: Stability-Weighted Decoding for Diffusion Language…

STDec: Spatio-Temporal Stability Guided Decoding for dLLMs

Diffusion Large Language Models (dLLMs) have achieved rapid progress, viewed as a promising alternative to the autoregressive paradigm. However, most dLLM decoders still adopt a global confidence threshold, and do not explicitly model local…

Computation and Language · Computer Science 2026-04-09 Yuzhe Chen , Jiale Cao , Xuyang Liu , Jin Xie , Aiping Yang , Yanwei Pang

Self Speculative Decoding for Diffusion Large Language Models

Diffusion-based Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive models, offering unique advantages through bidirectional attention and parallel generation paradigms. However, the generation results…

Computation and Language · Computer Science 2025-10-07 Yifeng Gao , Ziang Ji , Yuxuan Wang , Biqing Qi , Hanlin Xu , Linfeng Zhang

PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

Diffusion large language models (dLLMs) generate text by iteratively denoising masked token sequences. Although dLLMs can predict all masked positions in parallel within each step, the large number of denoising iterations still makes…

Computation and Language · Computer Science 2026-05-18 Shengyin Sun , Yiming Li , Renxi Liu , Xinqi Li , Hui-Ling Zhen , Weizhe Lin , Chen Chen , Xianzhi Yu , Mingxuan Yuan , Chen Ma

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

Diffusion large language models (dLLMs) generate text through iterative denoising, yet current decoding strategies discard rich intermediate predictions in favor of the final output. Our work here reveals a critical phenomenon, temporal…

Computation and Language · Computer Science 2025-10-07 Wen Wang , Bozhen Fang , Chenchen Jing , Yongliang Shen , Yangyi Shen , Qiuyu Wang , Hao Ouyang , Hao Chen , Chunhua Shen

Speculative Safety-Aware Decoding

Despite extensive efforts to align Large Language Models (LLMs) with human values and safety rules, jailbreak attacks that exploit certain vulnerabilities continuously emerge, highlighting the need to strengthen existing LLMs with…

Machine Learning · Computer Science 2025-09-30 Xuekang Wang , Shengyu Zhu , Xueqi Cheng

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit

Diffusion large language models (dLLMs) generate text through iterative denoising. In commonly adopted parallel decoding schemes, each step confirms only high-confidence positions while remasking the others. By analyzing dLLM denoising…

Computation and Language · Computer Science 2026-05-27 Kangyu Wang , Zhiyun Jiang , Haibo Feng , Weijia Zhao , Lin Liu , Jianguo Li , Zhenzhong Lan , Weiyao Lin

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive generation by enabling parallel token prediction. However, practical dLLM decoding still suffers from high inference latency, which limits…

Computation and Language · Computer Science 2026-04-22 Zhenbang Du , Kejing Xia , Xinrui Zhong , Yonggan Fu , Nicolai Oswald , Binfei Ji , Brucek Khailany , Pavlo Molchanov , Yingyan Lin

STaRR: Spatial-Temporal Token-Dynamics-Aware Responsive Remasking for Diffusion Language Models

Diffusion Language Models (DLMs) enable parallel decoding via iterative denoising, where remasking strategies play a critical role in balancing inference speed and output quality. Existing methods predominantly rely on static confidence…

Computation and Language · Computer Science 2026-02-24 Xinhao Sun , Huaijin Zhao , Maoliang Li , Zihao Zheng , Jiayu Chen , Yun Liang , Xiang Chen

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Training stability is a persistent challenge in the pre-training of large language models (LLMs), particularly for architectures such as Post-Norm Transformers, which are prone to gradient explosion and dissipation. In this paper, we…

Computation and Language · Computer Science 2025-02-26 Ya Wang , Zhijian Zhuo , Yutao Zeng , Xun Zhou , Jian Yang , Xiaoqing Li

Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput

Speculative decoding accelerates large language model (LLM) inference by using a lightweight draft model to propose tokens that are later verified by a stronger target model. While effective in centralized systems, its behavior in…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Jingwei Song , Wanyi Chen , Xinyuan Song , Max , Chris Tong , Gufeng Chen , Tianyi Zhao , Eric Yang , Bill Shi , Lynn Ai

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

Accelerating the inference of large language models (LLMs) has been a critical challenge in generative AI. Speculative decoding (SD) substantially improves LLM inference efficiency. However, its utility is limited by a fundamental…

Computation and Language · Computer Science 2026-05-05 Sibo Xiao , Jinyuan Fu , Zhongle Xie , Lidan Shou

DLM-SWAI: Steering Diffusion Language Models Before They Unmask

Steering language model generation toward desired textual properties is essential for practical deployment, and inference-time methods are particularly appealing because they enable controllable generation without retraining. Recent work…

Computation and Language · Computer Science 2026-05-29 Hyeseon An , Yo-Sub Han

dVoting: Fast Voting for dLLMs

Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary…

Computation and Language · Computer Science 2026-02-13 Sicheng Feng , Zigeng Chen , Xinyin Ma , Gongfan Fang , Xinchao Wang

Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token…

Computation and Language · Computer Science 2025-06-12 Jui-Ming Yao , Hao-Yuan Chen , Zi-Xian Tang , Bing-Jia Tan , Sheng-Wei Peng , Bing-Cheng Xie , Shun-Feng Su

Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs

Masked diffusion models (MDMs) offer a promising non-autoregressive alternative for large language modeling. Standard decoding methods for MDMs, such as confidence-based sampling, select tokens independently based on individual token…

Computation and Language · Computer Science 2025-09-23 Daehoon Gwak , Minseo Jung , Junwoo Park , Minho Park , ChaeHun Park , Junha Hyung , Jaegul Choo

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain…

Machine Learning · Computer Science 2025-12-02 Fengze Yu , Leshu Li , Brad McDanel , Sai Qian Zhang

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

Diffusion large language models (dLLMs) offer a promising paradigm for parallel text generation, but in practice they face an accuracy-parallelism trade-off, where increasing tokens per forward (TPF) often degrades generation quality.…

Computation and Language · Computer Science 2026-05-12 Haoyang Zhou , Li Kong , Shijie Ren , Xiting Wang , Shuang Liang , Guowei Wang , Zhenxuan Pan

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models

Diffusion large language models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) models, offering advantages such as accelerated parallel decoding and bidirectional context modeling. However, the vanilla…

Computation and Language · Computer Science 2025-10-07 Runchu Tian , Junxia Cui , Xueqiang Xu , Feng Yao , Jingbo Shang

Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models

Diffusion models have shown promise in text generation, but often struggle with generating long, coherent, and contextually accurate text. Token-level diffusion doesn't model word-order dependencies explicitly and operates on short, fixed…

Computation and Language · Computer Science 2025-05-27 Xiaochen Zhu , Georgi Karadzhov , Chenxi Whitehouse , Andreas Vlachos

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

While LLM-based Automatic Speech Recognition (ASR) achieves high accuracy, its speed is limited by sequential autoregressive decoding. Diffusion Language Models (DLMs) offer a parallel alternative, yet their decoding strategies remain…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-29 Jeong Hun Yeo , Minsu Kim , Hyeongseop Rha , Yong Man Ro