Fast Distributed Gradient Methods

Dusan Jakovetic; Joao Xavier; Jose M. F. Moura

Fast Distributed Gradient Methods

Information Theory 2014-04-15 v4 math.IT

Authors: Dusan Jakovetic , Joao Xavier , Jose M. F. Moura

Abstract

We study distributed optimization problems when $N$ nodes minimize the sum of their individual costs subject to a common vector variable. The costs are convex, have Lipschitz continuous gradient (with constant $L$ ), and bounded gradient. We propose two fast distributed gradient algorithms based on the centralized Nesterov gradient algorithm and establish their convergence rates in terms of the per-node communications $\mathcal{K}$ and the per-node gradient evaluations $k$ . Our first method, Distributed Nesterov Gradient, achieves rates $O\left({\log \mathcal{K}}/{\mathcal{K}}\right)$ and $O\left({\log k}/{k}\right)$ . Our second method, Distributed Nesterov gradient with Consensus iterations, assumes at all nodes knowledge of $L$ and $\mu(W)$ -- the second largest singular value of the $N \times N$ doubly stochastic weight matrix $W$ . It achieves rates $O\left({1}/{\mathcal{K}^{2-\xi}}\right)$ and $O\left({1}/{k^2}\right)$ ( $\xi>0$ arbitrarily small). Further, we give with both methods explicit dependence of the convergence constants on $N$ and $W$ . Simulation examples illustrate our findings.

Keywords

distributed optimization stochastic gradient descent distributed training

Cite

@article{arxiv.1112.2972,
  title  = {Fast Distributed Gradient Methods},
  author = {Dusan Jakovetic and Joao Xavier and Jose M. F. Moura},
  journal= {arXiv preprint arXiv:1112.2972},
  year   = {2014}
}

Fast Distributed Gradient Methods

Abstract

Keywords

Cite

Related papers