Related papers: Accelerated Gradient Flow: Risk, Stability, and Im…

Implicit Regularization of Accelerated Methods in Hilbert Spaces

We study learning properties of accelerated gradient descent methods for linear least-squares in Hilbert spaces. We analyze the implicit regularization properties of Nesterov acceleration and a variant of heavy-ball in terms of…

Machine Learning · Computer Science 2019-12-17 Nicolò Pagliana , Lorenzo Rosasco

Acceleration and Implicit Regularization in Gaussian Phase Retrieval

We study accelerated optimization methods in the Gaussian phase retrieval problem. In this setting, we prove that gradient methods with Polyak or Nesterov momentum have similar implicit regularization to gradient descent. This implicit…

Optimization and Control · Mathematics 2023-11-23 Tyler Maunu , Martin Molina-Fructuoso

Regularized Risk Minimization by Nesterov's Accelerated Gradient Methods: Algorithmic Extensions and Empirical Studies

Nesterov's accelerated gradient methods (AGM) have been successfully applied in many machine learning areas. However, their empirical performance on training max-margin models has been inferior to existing specialized solvers. In this…

Machine Learning · Computer Science 2010-11-03 Xinhua Zhang , Ankan Saha , S. V. N. Vishwanathan

Accelerating Stochastic Gradient Descent For Least Squares Regression

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error…

Machine Learning · Statistics 2018-08-02 Prateek Jain , Sham M. Kakade , Rahul Kidambi , Praneeth Netrapalli , Aaron Sidford

Conformal Symplectic and Relativistic Optimization

Arguably, the two most popular accelerated or momentum-based optimization methods in machine learning are Nesterov's accelerated gradient and Polyaks's heavy ball, both corresponding to different discretizations of a particular second order…

Optimization and Control · Mathematics 2020-12-25 Guilherme França , Jeremias Sulam , Daniel P. Robinson , René Vidal

Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima

This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy…

Optimization and Control · Mathematics 2026-04-07 Rishabh Dixit , Mert Gurbuzbalaban , Waheed U. Bajwa

Incorporating Preconditioning into Accelerated Approaches: Theoretical Guarantees and Practical Improvement

Machine learning and deep learning are widely researched fields that provide solutions to many modern problems. Due to the complexity of new problems related to the size of datasets, efficient approaches are obligatory. In optimization…

Optimization and Control · Mathematics 2025-10-01 Stepan Trifonov , Leonid Levin , Savelii Chezhegov , Aleksandr Beznosikov

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

Recently, {\it stochastic momentum} methods have been widely adopted in training deep neural networks. However, their convergence analysis is still underexplored at the moment, in particular for non-convex optimization. This paper fills the…

Optimization and Control · Mathematics 2016-05-06 Tianbao Yang , Qihang Lin , Zhe Li

Accelerated Performance and Accelerated Learning with Discrete-Time High-Order Tuners

We consider two high-order tuners that have been shown to have accelerated performance, one based on Polyak's heavy ball method and another based on Nesterov's acceleration method. We show that parameter estimates are bounded and converge…

Optimization and Control · Mathematics 2022-09-15 Yingnan Cui , Anuradha M. Annaswamy

Algorithmic Instabilities of Accelerated Gradient Descent

We study the algorithmic stability of Nesterov's accelerated gradient method. For convex quadratic objectives, Chen et al. (2018) proved that the uniform stability of the method grows quadratically with the number of optimization steps, and…

Machine Learning · Computer Science 2021-06-22 Amit Attia , Tomer Koren

A Discrete Variational Derivation of Accelerated Methods in Optimization

Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective. This has opened up the possibility of introducing…

Optimization and Control · Mathematics 2024-04-17 Cédric M. Campos , Alejandro Mahillo , David Martín de Diego

Stability and Convergence Trade-off of Iterative Optimization Algorithms

The overall performance or expected excess risk of an iterative machine learning algorithm can be decomposed into training error and generalization error. While the former is controlled by its convergence analysis, the latter can be tightly…

Machine Learning · Statistics 2018-04-06 Yuansi Chen , Chi Jin , Bin Yu

Differentially Private Accelerated Optimization Algorithms

We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decrease…

Machine Learning · Computer Science 2022-05-17 Nurdan Kuru , Ş. İlker Birbil , Mert Gurbuzbalaban , Sinan Yildirim

Gradient Norm Minimization of Nesterov Acceleration: $o(1/k^3)$

In the history of first-order algorithms, Nesterov's accelerated gradient descent (NAG) is one of the milestones. However, the cause of the acceleration has been a mystery for a long time. It has not been revealed with the existence of…

Optimization and Control · Mathematics 2022-09-20 Shuo Chen , Bin Shi , Ya-xiang Yuan

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Stochastic gradient methods with momentum are widely used in applications and at the core of optimization subroutines in many popular machine learning libraries. However, their sample complexities have not been obtained for problems beyond…

Optimization and Control · Mathematics 2021-02-12 Vien V. Mai , Mikael Johansson

Stochastic Gradient Descent with Momentum is Algorithmically Stable

Stochastic gradient descent with momentum (SGDM) is one of the most widely used optimization algorithms in machine learning. While optimization properties of SGDM have been extensively studied in the literature, it remains insufficiently…

Machine Learning · Computer Science 2026-05-28 Yunwen Lei , Zimeng Wang , Xiaoming Yuan

Robustly Stable Accelerated Momentum Methods With A Near-Optimal L2 Gain and $H_\infty$ Performance

We consider the problem of minimizing a strongly convex smooth function where the gradients are subject to additive worst-case deterministic errors that are square-summable. We study the trade-offs between the convergence rate and…

Optimization and Control · Mathematics 2023-10-23 Mert Gurbuzbalaban

Accelerated regularized learning in finite N-person games

Motivated by the success of Nesterov's accelerated gradient algorithm for convex minimization problems, we examine whether it is possible to achieve similar performance gains in the context of online learning in games. To that end, we…

Computer Science and Game Theory · Computer Science 2024-12-31 Kyriakos Lotidis , Angeliki Giannou , Panayotis Mertikopoulos , Nicholas Bambos

A More Stable Accelerated Gradient Method Inspired by Continuous-Time Perspective

Nesterov's accelerated gradient method (NAG) is widely used in problems with machine learning background including deep learning, and is corresponding to a continuous-time differential equation. From this connection, the property of the…

Optimization and Control · Mathematics 2022-04-05 Yasong Feng , Weiguo Gao

Estimating Implicit Regularization in Deep Learning

Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization --…

Machine Learning · Statistics 2026-05-08 Joseph H. Rudoler , Kevin Tan , Giles Hooker , Konrad P. Kording