English

Parallel Sampling via Autospeculation

Data Structures and Algorithms 2025-11-12 v1 Distributed, Parallel, and Cluster Computing Machine Learning Probability

Abstract

We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution μ\mu on [q]n[q]^n through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution μ\mu on Rn\mathbb{R}^n through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require O~(n)\widetilde{O}(n) time to produce a sample from μ\mu in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to O~(n1/2)\widetilde{O}(n^{1/2}). This improves the previous O~(n2/3)\widetilde{O}(n^{2/3}) bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of μ\mu is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~ν\nu that approximates~μ\mu to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation ν\nu out of the same oracle that defines~μ\mu. In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model ν\nu. Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of O~(n1/2)\widetilde{O}(n^{1/2}).

Keywords

Cite

@article{arxiv.2511.07869,
  title  = {Parallel Sampling via Autospeculation},
  author = {Nima Anari and Carlo Baronio and CJ Chen and Alireza Haqi and Frederic Koehler and Anqi Li and Thuy-Duong Vuong},
  journal= {arXiv preprint arXiv:2511.07869},
  year   = {2025}
}