Parallel Sampling via Autospeculation

Nima Anari; Carlo Baronio; CJ Chen; Alireza Haqi; Frederic Koehler; Anqi Li; Thuy-Duong Vuong

Parallel Sampling via Autospeculation

Data Structures and Algorithms 2025-11-12 v1 Distributed, Parallel, and Cluster Computing Machine Learning Probability

Authors: Nima Anari , Carlo Baronio , CJ Chen , Alireza Haqi , Frederic Koehler , Anqi Li , Thuy-Duong Vuong

Abstract

We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution $\mu$ on $[q]^n$ through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution $\mu$ on $\mathbb{R}^n$ through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require $\widetilde{O}(n)$ time to produce a sample from $\mu$ in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to $\widetilde{O}(n^{1/2})$ . This improves the previous $\widetilde{O}(n^{2/3})$ bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of $\mu$ is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~ $\nu$ that approximates~ $\mu$ to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation $\nu$ out of the same oracle that defines~ $\mu$ . In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model $\nu$ . Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of $\widetilde{O}(n^{1/2})$ .

Keywords

parallel algorithm randomized algorithm

Cite

@article{arxiv.2511.07869,
  title  = {Parallel Sampling via Autospeculation},
  author = {Nima Anari and Carlo Baronio and CJ Chen and Alireza Haqi and Frederic Koehler and Anqi Li and Thuy-Duong Vuong},
  journal= {arXiv preprint arXiv:2511.07869},
  year   = {2025}
}

Parallel Sampling via Autospeculation

Abstract

Keywords

Cite

Related papers