English

Simultaneous Inference for Massive Data: Distributed Bootstrap

Machine Learning 2020-02-21 v1 Machine Learning

Abstract

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods \cite{kleiner2014scalable,sengupta2016subsampled}, while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.

Keywords

Cite

@article{arxiv.2002.08443,
  title  = {Simultaneous Inference for Massive Data: Distributed Bootstrap},
  author = {Yang Yu and Shih-Kang Chao and Guang Cheng},
  journal= {arXiv preprint arXiv:2002.08443},
  year   = {2020}
}