Computation and Language · Computer Science
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang +5
2024-06-05
Computation and Language · Computer Science
Speculative Decoding: Performance or Illusion?
Xiaoxuan Liu, Jiaxiang Yu, Jongseok Park, Ion Stoica +1
2026-03-19
Computation and Language · Computer Science
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Heming Xia, Cunxiao Du, Yongqi Li, Qian Liu +1
2025-03-04
Computation and Language · Computer Science
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason +2
2024-02-20
Distributed, Parallel, and Cluster Computing · Computer Science
Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput
Jingwei Song, Wanyi Chen, Xinyuan Song, Max +6
2025-11-18
Artificial Intelligence · Computer Science
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Alvin Cheung +3
2024-06-11
Computation and Language · Computer Science
SSSD: Simply-Scalable Speculative Decoding
Michele Marzollo, Jiawei Zhuang, Niklas Roemer, Niklas Zwingenberger +2
2026-01-08
Computation and Language · Computer Science
Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios
Luohe Shi, Zuchao Li, Lefei Zhang, Baoyuan Qi +2
2025-11-26
Computation and Language · Computer Science
Automatic Task Detection and Heterogeneous LLM Speculative Decoding
Danying Ge, Jianhua Gao, Qizhi Jiang, Yifei Feng +1
2025-05-14
Computation and Language · Computer Science
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
Shengyin Sun, Yiming Li, Xing Li, Yingzhao Lian +7
2025-09-08
Computer Vision and Pattern Recognition · Computer Science
Speculative Decoding Reimagined for Multimodal Large Language Models
Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Rongrong Ji
2025-05-21
Computation and Language · Computer Science
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky +4
2025-06-12
Computation and Language · Computer Science
Speculative Contrastive Decoding
Hongyi Yuan, Keming Lu, Fei Huang, Zheng Yuan +1
2024-03-14
Computation and Language · Computer Science
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Euiin Yi, Taehyeon Kim, Hongseok Jeung, Du-Seong Chang +1
2024-11-12
Computation and Language · Computer Science
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Jacob K Christopher, Brian R Bartoldson, Tal Ben-Nun, Michael Cardei +2
2025-02-12
Computation and Language · Computer Science
Graph-Structured Speculative Decoding
Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu +4
2024-07-24
Software Engineering · Computer Science
An Empirical Study of Speculative Decoding on Software Engineering Tasks
Yijia Li, Junkai Chen, Xing Hu, Xin Xia
2026-05-05
Machine Learning · Computer Science
Fast Inference via Hierarchical Speculative Decoding
Clara Mohri, Haim Kaplan, Tal Schuster, Yishay Mansour +1
2025-10-24
Computation and Language · Computer Science
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky +3
2024-11-08
Machine Learning · Computer Science
Benchmarking the Energy Savings with Speculative Decoding Strategies
Rohit Dutta, Paramita Koley, Soham Poddar, Janardan Misra +4
2026-02-11
Machine Learning · Computer Science
A Theoretical Perspective for Speculative Decoding Algorithm
Ming Yin, Minshuo Chen, Kaixuan Huang, Mengdi Wang
2024-11-05