English

FutureFill: Fast Generation from Convolutional Sequence Models

Machine Learning 2025-06-24 v3 Artificial Intelligence Computation and Language

Abstract

We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill, a general-purpose fast generation method for any sequence prediction algorithm based on convolutional operators. FutureFill reduces generation time from quadratic to quasilinear in the context length. Moreover, when generating from a prompt, it requires a prefill cache whose size grows only with the number of tokens to be generated, often much smaller than the caches required by standard convolutional or attention based models. We validate our theoretical claims with experiments on synthetic tasks and demonstrate substantial efficiency gains when generating from a deep convolutional sequence prediction model.

Keywords

Cite

@article{arxiv.2410.03766,
  title  = {FutureFill: Fast Generation from Convolutional Sequence Models},
  author = {Naman Agarwal and Xinyi Chen and Evan Dogariu and Devan Shah and Hubert Strauss and Vlad Feinberg and Daniel Suo and Peter Bartlett and Elad Hazan},
  journal= {arXiv preprint arXiv:2410.03766},
  year   = {2025}
}
R2 v1 2026-06-28T19:09:09.173Z