English

Speeding Up MCMC by Efficient Data Subsampling

Methodology 2018-12-31 v6 Computation Machine Learning

Abstract

We propose Subsampling MCMC, a Markov Chain Monte Carlo (MCMC) framework where the likelihood function for nn observations is estimated from a random subset of mm observations. We introduce a highly efficient unbiased estimator of the log-likelihood based on control variates, such that the computing cost is much smaller than that of the full log-likelihood in standard MCMC. The likelihood estimate is bias-corrected and used in two dependent pseudo-marginal algorithms to sample from a perturbed posterior, for which we derive the asymptotic error with respect to nn and mm, respectively. We propose a practical estimator of the error and show that the error is negligible even for a very small mm in our applications. We demonstrate that Subsampling MCMC is substantially more efficient than standard MCMC in terms of sampling efficiency for a given computational budget, and that it outperforms other subsampling methods for MCMC proposed in the literature.

Keywords

Cite

@article{arxiv.1404.4178,
  title  = {Speeding Up MCMC by Efficient Data Subsampling},
  author = {Matias Quiroz and Robert Kohn and Mattias Villani and Minh-Ngoc Tran},
  journal= {arXiv preprint arXiv:1404.4178},
  year   = {2018}
}

Comments

Main changes: The theory has been significantly revised

R2 v1 2026-06-22T03:52:04.420Z