Computation and Language · Computer Science
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang
2024-07-02
Artificial Intelligence · Computer Science
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
Situo Zhang, Hankun Wang, Da Ma, Zichen Zhu +3
2024-12-30
Computation and Language · Computer Science
Graph-Structured Speculative Decoding
Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu +4
2024-07-24
Computation and Language · Computer Science
Speculative Decoding with a Speculative Vocabulary
Miles Williams, Young D. Kwon, Rui Li, Alexandros Kouris +1
2026-02-17
Distributed, Parallel, and Cluster Computing · Computer Science
Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput
Jingwei Song, Wanyi Chen, Xinyuan Song, Max +6
2025-11-18
Computer Vision and Pattern Recognition · Computer Science
Speculative Decoding Reimagined for Multimodal Large Language Models
Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Rongrong Ji
2025-05-21
Computation and Language · Computer Science
Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning
Jiebin Zhang, Zhenghan Yu, Liang Wang, Nan Yang +7
2026-03-03
Computation and Language · Computer Science
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
Jun Zhang, Jue Wang, Huan Li, Lidan Shou +3
2025-02-11
Artificial Intelligence · Computer Science
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong +1
2025-03-17
Machine Learning · Computer Science
DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving
Fengze Yu, Leshu Li, Brad McDanel, Sai Qian Zhang
2025-12-02
Machine Learning · Computer Science
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang
2025-03-05
Computation and Language · Computer Science
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavaram
2025-08-08
Machine Learning · Computer Science
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park +2
2024-03-06
Computation and Language · Computer Science
SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding
Jameson Sandler, Jacob K. Christopher, Thomas Hartvigsen, Ferdinando Fioretto
2025-11-05
Computation and Language · Computer Science
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Euiin Yi, Taehyeon Kim, Hongseok Jeung, Du-Seong Chang +1
2024-11-12
Computation and Language · Computer Science
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
Kai Lv, Honglin Guo, Qipeng Guo, Xipeng Qiu
2025-03-04
Computation and Language · Computer Science
Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions
Bangsheng Tang, Carl Chengyan Fu, Fei Kou, Grigory Sizov +34
2025-08-12
Machine Learning · Computer Science
Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs
Rui Pan, Zhuofu Chen, Hongyi Liu, Arvind Krishnamurthy +1
2026-01-29
Computation and Language · Computer Science
S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Parsa Kavehzadeh, Mohammadreza Pourreza, Mojtaba Valipour, Tinashu Zhu +4
2024-07-03
Computation and Language · Computer Science
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky +3
2024-11-08