Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

Jie Jiang; Xing Sun; Ruotian Chen; Jianan Su; Kaixin Shen

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

Computation and Language 2026-05-18 v2

Authors: Jie Jiang , Xing Sun , Ruotian Chen , Jianan Su , Kaixin Shen

Abstract

Speculative decoding accelerates LLM inference by having a lightweight draft model propose speculative windows of candidate tokens for parallel verification by a larger target model. In practice, speculative efficiency is often bottlenecked by hard-to-draft positions, where an early mismatch truncates the accepted prefix and invalidates the rest of the speculative window. Most learning-based drafters are still optimized with token-level supervised objectives, even though speculative utility is inherently window-level and prefix-sensitive. We propose PPOW (Performance-Driven Policy Optimization with Adaptive Windowing), a reinforcement learning framework that shifts drafter optimization from token-level imitation to window-level optimization. PPOW combines a Cost-Aware Speedup Reward, a Distribution-Based Proximity Reward, and Adaptive Divergence-Aware Windowing, which prioritizes informative windows with high confidence-weighted draft-target divergence. PPOW achieves average acceptance lengths of 6.29-6.52 and speedups of 3.39-4.36 $\times$ across multiple model families and benchmarks under a unified decoding protocol. These results show that performance-driven window-level optimization is a practical approach to improving speculative decoding efficiency.

Keywords

speculative decoding hyperparameter optimization policy gradient

Cite

@article{arxiv.2605.14978,
  title  = {Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing},
  author = {Jie Jiang and Xing Sun and Ruotian Chen and Jianan Su and Kaixin Shen},
  journal= {arXiv preprint arXiv:2605.14978},
  year   = {2026}
}

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

Abstract

Keywords

Cite

Related papers