English

Variance-Reduced Methods for Machine Learning

Machine Learning 2020-10-05 v1 Optimization and Control Machine Learning

Abstract

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions.

Keywords

Cite

@article{arxiv.2010.00892,
  title  = {Variance-Reduced Methods for Machine Learning},
  author = {Robert M. Gower and Mark Schmidt and Francis Bach and Peter Richtarik},
  journal= {arXiv preprint arXiv:2010.00892},
  year   = {2020}
}

Comments

16 pages, 7 figures, 1 table