Variance-Reduced Methods for Machine Learning
Abstract
Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions.
Cite
@article{arxiv.2010.00892,
title = {Variance-Reduced Methods for Machine Learning},
author = {Robert M. Gower and Mark Schmidt and Francis Bach and Peter Richtarik},
journal= {arXiv preprint arXiv:2010.00892},
year = {2020}
}
Comments
16 pages, 7 figures, 1 table