Related papers: Does Momentum Help? A Sample Complexity Analysis

On the insufficiency of existing momentum schemes for Stochastic Optimization

Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov's accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide…

Machine Learning · Computer Science 2018-08-02 Rahul Kidambi , Praneeth Netrapalli , Prateek Jain , Sham M. Kakade

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This…

Machine Learning · Computer Science 2018-08-31 Yan Yan , Tianbao Yang , Zhe Li , Qihang Lin , Yi Yang

Understanding the Role of Momentum in Stochastic Gradient Methods

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavy-ball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum…

Machine Learning · Computer Science 2019-10-31 Igor Gitman , Hunter Lang , Pengchuan Zhang , Lin Xiao

Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise

Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under…

Machine Learning · Computer Science 2024-03-19 Rui Pan , Yuxing Liu , Xiaoyu Wang , Tong Zhang

Accelerating Variance-Reduced Stochastic Gradient Methods

Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov's acceleration techniques to match…

Optimization and Control · Mathematics 2020-10-30 Derek Driggs , Matthias J. Ehrhardt , Carola-Bibiane Schönlieb

(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum

Stochastic heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over stochastic gradient descent. By primarily focusing on strongly-convex quadratics, we aim to better…

Optimization and Control · Mathematics 2025-06-02 Anh Dang , Reza Babanezhad , Sharan Vaswani

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of…

Optimization and Control · Mathematics 2021-10-27 Courtney Paquette , Elliot Paquette

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Recently, {\it stochastic momentum} methods have been widely adopted in training deep neural networks. However, their convergence analysis is still underexplored at the moment, in particular for non-convex optimization. This paper fills the…

Optimization and Control · Mathematics 2016-05-06 Tianbao Yang , Qihang Lin , Zhe Li

On adaptive stochastic heavy ball momentum for solving linear systems

The stochastic heavy ball momentum (SHBM) method has gained considerable popularity as a scalable approach for solving large-scale optimization problems. However, one limitation of this method is its reliance on prior knowledge of certain…

Optimization and Control · Mathematics 2024-04-04 Yun Zeng , Deren Han , Yansheng Su , Jiaxin Xie

Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters

In this paper, we present a unified algorithm for stochastic optimization that makes use of a "momentum" term; in other words, the stochastic gradient depends not only on the current true gradient of the objective function, but also on the…

Optimization and Control · Mathematics 2025-09-10 Mathukumalli Vidyasagar

Shuffling Momentum Gradient Algorithm for Convex Optimization

The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale…

Optimization and Control · Mathematics 2024-03-06 Trang H. Tran , Quoc Tran-Dinh , Lam M. Nguyen

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient…

Optimization and Control · Mathematics 2021-07-13 A. Barakat , P. Bianchi , W. Hachem , Sh. Schechtman

Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization

Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning.…

Machine Learning · Computer Science 2021-10-19 Tao Sun , Huaming Ling , Zuoqiang Shi , Dongsheng Li , Bao Wang

On the Convergence Analysis of Aggregated Heavy-Ball Method

Momentum first-order optimization methods are the workhorses in various optimization tasks, e.g., in the training of deep neural networks. Recently, Lucas et al. (2019) proposed a method called Aggregated Heavy-Ball (AggHB) that uses…

Optimization and Control · Mathematics 2022-03-07 Marina Danilova

Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual…

Optimization and Control · Mathematics 2018-03-30 Nicolas Loizou , Peter Richtárik

The Marginal Value of Momentum for Small Learning Rate SGD

Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise. In stochastic optimization, such as training neural networks, folklore suggests that momentum may help deep…

Machine Learning · Computer Science 2024-04-17 Runzhe Wang , Sadhika Malladi , Tianhao Wang , Kaifeng Lyu , Zhiyuan Li

Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation

Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes due to aggressive momentum accumulation. We…

Machine Learning · Computer Science 2025-12-12 Sarwan Ali

Projected Stochastic Momentum Methods for Nonlinear Equality-Constrained Optimization for Machine Learning

Two algorithms are proposed, analyzed, and tested for solving continuous optimization problems with nonlinear equality constraints. Each is an extension of a stochastic momentum-based method from the unconstrained setting to the setting of…

Optimization and Control · Mathematics 2026-01-21 Qi Wang , Christian Piermarini , Yunlang Zhu , Frank E. Curtis

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

Gradient correlation is a key ingredient to accelerate SGD with momentum

Empirically, it has been observed that adding momentum to Stochastic Gradient Descent (SGD) accelerates the convergence of the algorithm. However, the literature has been rather pessimistic, even in the case of convex functions, about the…

Optimization and Control · Mathematics 2025-01-27 Julien Hermant , Marien Renaud , Jean-François Aujol , Charles Dossal , Aude Rondepierre