Self-Speculative Masked Diffusions

Andrew Campbell; Valentin De Bortoli; Jiaxin Shi; Arnaud Doucet

Self-Speculative Masked Diffusions

Machine Learning 2026-03-09 v2 Machine Learning

Authors: Andrew Campbell , Valentin De Bortoli , Jiaxin Shi , Arnaud Doucet

Abstract

We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating non-factorized predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequence generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.

Keywords

density estimation and sampling matrix factorization distribution theory

Cite

@article{arxiv.2510.03929,
  title  = {Self-Speculative Masked Diffusions},
  author = {Andrew Campbell and Valentin De Bortoli and Jiaxin Shi and Arnaud Doucet},
  journal= {arXiv preprint arXiv:2510.03929},
  year   = {2026}
}

Comments

32 pages, 7 figures, 4 tables

Self-Speculative Masked Diffusions

Abstract

Keywords

Cite

Comments

Related papers