Related papers: Variance-Reduced Methods for Machine Learning

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors…

Machine Learning · Computer Science 2018-10-31 Fanhua Shang , Kaiwen Zhou , Hongying Liu , James Cheng , Ivor W. Tsang , Lijun Zhang , Dacheng Tao , Licheng Jiao

Learning Theory of the SVRG: Generalization and Convergence Analysis

Variance reduction (VR) methods employ stochastic gradients with decreasing variance, and they have been widely applied to solve large-scale optimization problems in machine learning because of their efficiency. Existing theoretical studies…

Machine Learning · Computer Science 2026-05-28 Yunwen Lei , Zimeng Wang , Xiaoming Yuan

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

In this paper, we propose a simple variant of the original stochastic variance reduction gradient (SVRG), where hereafter we refer to as the variance reduced stochastic gradient descent (VR-SGD). Different from the choices of the snapshot…

Machine Learning · Computer Science 2017-04-18 Fanhua Shang

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for…

Machine Learning · Statistics 2017-11-16 Alberto Bietti , Julien Mairal

Stochastic Gradient Descent Revisited

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby…

Optimization and Control · Mathematics 2025-03-11 Azar Louzi

Variance Reduction for Distributed Stochastic Gradient Descent

Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require…

Machine Learning · Computer Science 2017-04-10 Soham De , Gavin Taylor , Tom Goldstein

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex…

Optimization and Control · Mathematics 2023-08-15 Da Li , Jingjing Wu , Qingrun Zhang

Stochastic Variance Reduction for Nonconvex Optimization

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient…

Optimization and Control · Mathematics 2016-04-06 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabas Poczos , Alex Smola

Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

Stochastic optimization algorithms are widely used for machine learning with large-scale data. However, their convergence often suffers from non-vanishing variance. Variance Reduction (VR) methods, such as SVRG and SARAH, address this issue…

Machine Learning · Computer Science 2026-01-12 Daniil Medyakov , Gleb Molodtsov , Savelii Chezhegov , Alexey Rebrikov , Aleksandr Beznosikov

Efficient Distributed SGD with Variance Reduction

Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets. However, SGD suffers from two main drawbacks: (i) The noisy gradient updates have high…

Machine Learning · Computer Science 2017-04-10 Soham De , Tom Goldstein

SVGD: A Virtual Gradients Descent Method for Stochastic Optimization

Inspired by dynamic programming, we propose Stochastic Virtual Gradient Descent (SVGD) algorithm where the Virtual Gradient is defined by computational graph and automatic differentiation. The method is computationally efficient and has…

Machine Learning · Computer Science 2019-08-01 Zheng Li , Shi Shu

Shuffling Momentum Gradient Algorithm for Convex Optimization

The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale…

Optimization and Control · Mathematics 2024-03-06 Trang H. Tran , Quoc Tran-Dinh , Lam M. Nguyen

Variance Reduced Stochastic Gradient Descent with Neighbors

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness,…

Machine Learning · Computer Science 2016-02-29 Thomas Hofmann , Aurelien Lucchi , Simon Lacoste-Julien , Brian McWilliams

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

Reinforced stochastic gradient descent for deep neural network learning

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

Stochastic gradient with least-squares control variates

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable…

Machine Learning · Computer Science 2019-09-05 Chi Jin , Praneeth Netrapalli , Rong Ge , Sham M. Kakade , Michael I. Jordan