English
Related papers

Related papers: Parallel Sampling via Autospeculation

200 papers

We show how to use parallelization to speed up sampling from an arbitrary distribution $\mu$ on a product space $[q]^n$, given oracle access to counting queries: $\mathbb{P}_{X\sim \mu}[X_S=\sigma_S]$ for any $S\subseteq [n]$ and $\sigma_S…

Data Structures and Algorithms · Computer Science 2024-08-20 Nima Anari , Ruiquan Gao , Aviad Rubinstein

Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and \emph{evaluate}, reducing the inference cost for diffusion models remains a major goal.…

Machine Learning · Computer Science 2025-12-01 Haoxuan Chen , Yinuo Ren , Lexing Ying , Grant M. Rotskoff

Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps,…

Machine Learning · Computer Science 2023-10-17 Andy Shih , Suneel Belkhale , Stefano Ermon , Dorsa Sadigh , Nima Anari

Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work,…

Machine Learning · Computer Science 2024-05-28 Zhiwei Tang , Jiasheng Tang , Hao Luo , Fan Wang , Tsung-Hui Chang

Sampling algorithms play an important role in controlling the quality and runtime of diffusion model inference. In recent years, a number of works~\cite{chen2023sampling,chen2023ode,benton2023error,lee2022convergence} have proposed schemes…

Machine Learning · Computer Science 2024-10-18 Shivam Gupta , Linda Cai , Sitan Chen

Speculative decoding has emerged as a widely adopted method to accelerate large language model inference without sacrificing the quality of the model outputs. While this technique has facilitated notable speed improvements by enabling…

Computation and Language · Computer Science 2025-02-12 Jacob K Christopher , Brian R Bartoldson , Tal Ben-Nun , Michael Cardei , Bhavya Kailkhura , Ferdinando Fioretto

Sampling from high-dimensional probability distributions is fundamental in machine learning and statistics. As datasets grow larger, computational efficiency becomes increasingly important, particularly in reducing adaptive complexity,…

Data Structures and Algorithms · Computer Science 2025-09-23 Huanjian Zhou , Masashi Sugiyama

In diffusion models, samples are generated through an iterative refinement process, requiring hundreds of sequential model evaluations. Several recent methods have introduced approximations (fewer discretization steps or distillation) to…

Machine Learning · Computer Science 2024-12-12 Nikil Roashan Selvam , Amil Merchant , Stefano Ermon

In this paper, we design an algorithm to accelerate the diffusion process on the $SO(3)$ manifold. The inherently sequential nature of diffusion models necessitates substantial time for denoising perturbed data. To overcome this limitation,…

Machine Learning · Computer Science 2025-07-15 Yan-Ting Chen , Hao-Wei Chen , Tsu-Ching Hsiao , Chun-Yi Lee

In arbitrary-order language models, it is an open question how to sample tokens in parallel from the correct joint distribution. With discrete diffusion models, the more tokens they generate in parallel, the less their predicted…

Machine Learning · Computer Science 2025-04-30 Gabe Guo , Stefano Ermon

Inference from large autoregressive models like Transformers is slow - decoding K tokens takes K serial runs of the model. In this work we introduce speculative decoding - an algorithm to sample from autoregressive models faster without any…

Machine Learning · Computer Science 2023-05-22 Yaniv Leviathan , Matan Kalman , Yossi Matias

We present speculative sampling, an algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call. Our algorithm relies on the observation that the latency of parallel scoring of…

Computation and Language · Computer Science 2023-02-03 Charlie Chen , Sebastian Borgeaud , Geoffrey Irving , Jean-Baptiste Lespiau , Laurent Sifre , John Jumper

Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the…

Machine Learning · Computer Science 2025-08-08 Hengyuan Hu , Aniket Das , Dorsa Sadigh , Nima Anari

Speculative decoding has proven to be an efficient solution to large language model (LLM) inference, where the small drafter predicts future tokens at a low cost, and the target model is leveraged to verify them in parallel. However, most…

Computation and Language · Computer Science 2024-10-10 Zilin Xiao , Hongming Zhang , Tao Ge , Siru Ouyang , Vicente Ordonez , Dong Yu

Scaling the size of language models to tens of billions of parameters has led to impressive performance on a wide range of tasks. At generation, these models are used auto-regressively, requiring a forward pass for each generated token, and…

Computation and Language · Computer Science 2023-11-23 Giovanni Monea , Armand Joulin , Edouard Grave

Continuous visual autoregressive (AR) models have demonstrated promising performance in image generation. However, the heavy autoregressive inference burden imposes significant overhead. In Large Language Models (LLMs), speculative decoding…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Zili Wang , Robert Zhang , Kun Ding , Qi Yang , Fei Li , Shiming Xiang

Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying…

Machine Learning · Computer Science 2026-05-06 Tanishq Kumar , Tri Dao , Avner May

Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks.…

Machine Learning · Computer Science 2024-01-19 Ziteng Sun , Ananda Theertha Suresh , Jae Hun Ro , Ahmad Beirami , Himanshu Jain , Felix Yu

Diffusion models have achieved remarkable success in generating high-fidelity content but suffer from slow, iterative sampling, resulting in high latency that limits their use in interactive applications. We introduce DRiffusion, a parallel…

Machine Learning · Computer Science 2026-03-30 Runsheng Bai , Chengyu Zhang , Yangdong Deng

Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In…

Machine Learning · Computer Science 2023-07-25 Hongkai Zheng , Weili Nie , Arash Vahdat , Kamyar Azizzadenesheli , Anima Anandkumar
‹ Prev 1 2 3 10 Next ›