On Data Dependence in Distributed Stochastic Optimization

Avleen S. Bijral; Anand D. Sarwate; Nathan Srebro

On Data Dependence in Distributed Stochastic Optimization

Optimization and Control 2016-09-02 v2

Authors: Avleen S. Bijral , Anand D. Sarwate , Nathan Srebro

Abstract

We study a distributed consensus-based stochastic gradient descent (SGD) algorithm and show that the rate of convergence involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network topology and a new term depending on the spectral norm of the sample covariance matrix of the data. This data-dependent convergence rate shows that distributed SGD algorithms perform better on datasets with small spectral norm. Our analysis method also allows us to find data-dependent convergence rates as we limit the amount of communication. Spreading a fixed amount of data across more nodes slows convergence; for asymptotically growing data sets we show that adding more machines can help when minimizing twice-differentiable losses.

Keywords

stochastic gradient descent distributed optimization stochastic optimization

Cite

@article{arxiv.1603.04379,
  title  = {On Data Dependence in Distributed Stochastic Optimization},
  author = {Avleen S. Bijral and Anand D. Sarwate and Nathan Srebro},
  journal= {arXiv preprint arXiv:1603.04379},
  year   = {2016}
}

On Data Dependence in Distributed Stochastic Optimization

Abstract

Keywords

Cite

Related papers