Parallel Sampling via Autospeculation
Abstract
We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution on through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution on through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require time to produce a sample from in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to . This improves the previous bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~ that approximates~ to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation out of the same oracle that defines~. In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model . Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of .
Keywords
Cite
@article{arxiv.2511.07869,
title = {Parallel Sampling via Autospeculation},
author = {Nima Anari and Carlo Baronio and CJ Chen and Alireza Haqi and Frederic Koehler and Anqi Li and Thuy-Duong Vuong},
journal= {arXiv preprint arXiv:2511.07869},
year = {2025}
}