English
Related papers

Related papers: Does Momentum Help? A Sample Complexity Analysis

200 papers

Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov's accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide…

Machine Learning · Computer Science 2018-08-02 Rahul Kidambi , Praneeth Netrapalli , Prateek Jain , Sham M. Kakade

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This…

Machine Learning · Computer Science 2018-08-31 Yan Yan , Tianbao Yang , Zhe Li , Qihang Lin , Yi Yang

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavy-ball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum…

Machine Learning · Computer Science 2019-10-31 Igor Gitman , Hunter Lang , Pengchuan Zhang , Lin Xiao

Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under…

Machine Learning · Computer Science 2024-03-19 Rui Pan , Yuxing Liu , Xiaoyu Wang , Tong Zhang

Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov's acceleration techniques to match…

Optimization and Control · Mathematics 2020-10-30 Derek Driggs , Matthias J. Ehrhardt , Carola-Bibiane Schönlieb

Stochastic heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over stochastic gradient descent. By primarily focusing on strongly-convex quadratics, we aim to better…

Optimization and Control · Mathematics 2025-06-02 Anh Dang , Reza Babanezhad , Sharan Vaswani

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of…

Optimization and Control · Mathematics 2021-10-27 Courtney Paquette , Elliot Paquette

Recently, {\it stochastic momentum} methods have been widely adopted in training deep neural networks. However, their convergence analysis is still underexplored at the moment, in particular for non-convex optimization. This paper fills the…

Optimization and Control · Mathematics 2016-05-06 Tianbao Yang , Qihang Lin , Zhe Li

The stochastic heavy ball momentum (SHBM) method has gained considerable popularity as a scalable approach for solving large-scale optimization problems. However, one limitation of this method is its reliance on prior knowledge of certain…

Optimization and Control · Mathematics 2024-04-04 Yun Zeng , Deren Han , Yansheng Su , Jiaxin Xie

In this paper, we present a unified algorithm for stochastic optimization that makes use of a "momentum" term; in other words, the stochastic gradient depends not only on the current true gradient of the objective function, but also on the…

Optimization and Control · Mathematics 2025-09-10 Mathukumalli Vidyasagar

The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale…

Optimization and Control · Mathematics 2024-03-06 Trang H. Tran , Quoc Tran-Dinh , Lam M. Nguyen

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient…

Optimization and Control · Mathematics 2021-07-13 A. Barakat , P. Bianchi , W. Hachem , Sh. Schechtman

Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning.…

Machine Learning · Computer Science 2021-10-19 Tao Sun , Huaming Ling , Zuoqiang Shi , Dongsheng Li , Bao Wang

Momentum first-order optimization methods are the workhorses in various optimization tasks, e.g., in the training of deep neural networks. Recently, Lucas et al. (2019) proposed a method called Aggregated Heavy-Ball (AggHB) that uses…

Optimization and Control · Mathematics 2022-03-07 Marina Danilova

In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual…

Optimization and Control · Mathematics 2018-03-30 Nicolas Loizou , Peter Richtárik

Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise. In stochastic optimization, such as training neural networks, folklore suggests that momentum may help deep…

Machine Learning · Computer Science 2024-04-17 Runzhe Wang , Sadhika Malladi , Tianhao Wang , Kaifeng Lyu , Zhiyuan Li

Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes due to aggressive momentum accumulation. We…

Machine Learning · Computer Science 2025-12-12 Sarwan Ali

Two algorithms are proposed, analyzed, and tested for solving continuous optimization problems with nonlinear equality constraints. Each is an extension of a stochastic momentum-based method from the unconstrained setting to the setting of…

Optimization and Control · Mathematics 2026-01-21 Qi Wang , Christian Piermarini , Yunlang Zhu , Frank E. Curtis

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

Empirically, it has been observed that adding momentum to Stochastic Gradient Descent (SGD) accelerates the convergence of the algorithm. However, the literature has been rather pessimistic, even in the case of convex functions, about the…

Optimization and Control · Mathematics 2025-01-27 Julien Hermant , Marien Renaud , Jean-François Aujol , Charles Dossal , Aude Rondepierre
‹ Prev 1 2 3 10 Next ›