Related papers: Shuffling Momentum Gradient Algorithm for Convex O…

SMG: A Shuffling Gradient-Based Method with Momentum

We combine two advanced ideas widely used in optimization for machine learning: shuffling strategy and momentum technique to develop a novel shuffling gradient-based method with momentum, coined Shuffling Momentum Gradient (SMG), for…

Optimization and Control · Mathematics 2021-06-10 Trang H. Tran , Lam M. Nguyen , Quoc Tran-Dinh

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which…

Machine Learning · Computer Science 2023-10-27 Lam M. Nguyen , Trang H. Tran

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

Generalized Stochastic Gradient Descent with Momentum Methods for Smooth Optimization

Stochastic gradient descent with momentum (SGDM) methods have become fundamental optimization tools in machine learning, combining the computational efficiency of stochastic gradients with the acceleration benefits of momentum. Despite…

Optimization and Control · Mathematics 2026-03-02 Zimeng Wang , Alp Yurtsever

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization

We study the convergence of the shuffling gradient method, a popular algorithm employed to minimize the finite-sum function with regularization, in which functions are passed to apply (Proximal) Gradient Descent (GD) one by one whose order…

Optimization and Control · Mathematics 2025-05-30 Zijian Liu , Zhengyuan Zhou

Stochastic Gradient Descent Revisited

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby…

Optimization and Control · Mathematics 2025-03-11 Azar Louzi

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex…

Optimization and Control · Mathematics 2023-08-15 Da Li , Jingjing Wu , Qingrun Zhang

$\mu^2$-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism

We consider stochastic convex optimization problems where the objective is an expectation over smooth functions. For this setting we suggest a novel gradient estimate that combines two recent mechanism that are related to notion of…

Machine Learning · Computer Science 2025-03-06 Tehila Dahan , Kfir Y. Levy

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference, and etc. Despite its empirical…

Machine Learning · Computer Science 2021-03-09 Tianyi Liu , Zhehui Chen , Enlu Zhou , Tuo Zhao

A High Probability Analysis of Adaptive SGD with Momentum

Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning applications. In particular, SGD with adaptive learning rates and momentum is the industry standard to train deep networks. Despite the…

Machine Learning · Statistics 2020-07-29 Xiaoyu Li , Francesco Orabona

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan

Stochastic Variance Reduction for Nonconvex Optimization

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient…

Optimization and Control · Mathematics 2016-04-06 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabas Poczos , Alex Smola

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This…

Machine Learning · Computer Science 2018-08-31 Yan Yan , Tianbao Yang , Zhe Li , Qihang Lin , Yi Yang

Escaping Saddle Points Faster with Stochastic Momentum

Stochastic gradient descent (SGD) with stochastic momentum is popular in nonconvex stochastic optimization and particularly for the training of deep neural networks. In standard SGD, parameters are updated by improving along the path of the…

Machine Learning · Computer Science 2021-06-08 Jun-Kun Wang , Chi-Heng Lin , Jacob Abernethy

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for…

Machine Learning · Statistics 2017-11-16 Alberto Bietti , Julien Mairal

Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

In this paper, we propose Nesterov Accelerated Shuffling Gradient (NASG), a new algorithm for the convex finite-sum minimization problems. Our method integrates the traditional Nesterov's acceleration momentum with different shuffling…

Optimization and Control · Mathematics 2022-06-14 Trang H. Tran , Katya Scheinberg , Lam M. Nguyen

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness…

Machine Learning · Computer Science 2013-01-01 Ohad Shamir , Tong Zhang

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothness. Our…

Machine Learning · Statistics 2026-05-20 Sharan Sahu , Cameron J. Hogan , Martin T. Wells

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many…

Optimization and Control · Mathematics 2021-09-21 Lam M. Nguyen , Quoc Tran-Dinh , Dzung T. Phan , Phuong Ha Nguyen , Marten van Dijk