Randomized Distributed Mean Estimation: Accuracy vs Communication

Jakub Konečný; Peter Richtárik

Randomized Distributed Mean Estimation: Accuracy vs Communication

Distributed, Parallel, and Cluster Computing 2016-11-24 v1 Numerical Analysis Machine Learning

Authors: Jakub Konečný , Peter Richtárik

Abstract

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. This problem arises as a subproblem in many applications, including reduce-all operations within algorithms for distributed and federated optimization and learning. We propose a flexible family of randomized algorithms exploring the trade-off between expected communication cost and estimation error. Our family contains the full-communication and zero-error method on one extreme, and an $\epsilon$ -bit communication and ${\cal O}\left(1/(\epsilon n)\right)$ error method on the opposite extreme. In the special case where we communicate, in expectation, a single bit per coordinate of each vector, we improve upon existing results by obtaining $\mathcal{O}(r/n)$ error, where $r$ is the number of bits used to represent a floating point value.

Keywords

distributed optimization gaussian estimation leader election

Cite

@article{arxiv.1611.07555,
  title  = {Randomized Distributed Mean Estimation: Accuracy vs Communication},
  author = {Jakub Konečný and Peter Richtárik},
  journal= {arXiv preprint arXiv:1611.07555},
  year   = {2016}
}

Comments

19 pages, 1 figure

Randomized Distributed Mean Estimation: Accuracy vs Communication

Abstract

Keywords

Cite

Comments

Related papers