Accelerating Variance-Reduced Stochastic Gradient Methods

Derek Driggs; Matthias J. Ehrhardt; Carola-Bibiane Schönlieb

Accelerating Variance-Reduced Stochastic Gradient Methods

Optimization and Control 2020-10-30 v3

Authors: Derek Driggs , Matthias J. Ehrhardt , Carola-Bibiane Schönlieb

Abstract

Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov's acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on "negative momentum", a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions $n$ , scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of SAGA, SVRG, SARAH, and SARGE using our framework significantly outperform non-accelerated versions and compare favourably with algorithms using negative momentum.

Keywords

accelerated gradient methods gradient descent optimization stochastic gradient descent

Cite

@article{arxiv.1910.09494,
  title  = {Accelerating Variance-Reduced Stochastic Gradient Methods},
  author = {Derek Driggs and Matthias J. Ehrhardt and Carola-Bibiane Schönlieb},
  journal= {arXiv preprint arXiv:1910.09494},
  year   = {2020}
}

Comments

33 pages

Accelerating Variance-Reduced Stochastic Gradient Methods

Abstract

Keywords

Cite

Comments

Related papers