Related papers: Corridor Geometry in Gradient-Based Optimization

Convergence of gradient descent for deep neural networks

We give a simple local Polyak-Lojasiewicz (PL) criterion that guarantees linear (exponential) convergence of gradient flow and gradient descent to a zero-loss solution of a nonnegative objective. We then verify this criterion for the…

Machine Learning · Computer Science 2026-02-23 Sourav Chatterjee

Conditions for linear convergence of the gradient method for non-convex optimization

In this paper, we derive a new linear convergence rate for the gradient method with fixed step lengths for non-convex smooth optimization problems satisfying the Polyak-Lojasiewicz (PL) inequality. We establish that the PL inequality is a…

Optimization and Control · Mathematics 2022-04-05 Hadi Abbaszadehpeivasti , Etienne de Klerk , Moslem Zamani

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on…

Machine Learning · Computer Science 2022-09-14 Spencer Frei , Quanquan Gu

Local Curvature Descent: Squeezing More Curvature out of Standard and Polyak Gradient Descent

We contribute to the growing body of knowledge on more powerful and adaptive stepsizes for convex optimization, empowered by local curvature information. We do not go the route of fully-fledged second-order methods which require the…

Optimization and Control · Mathematics 2024-05-28 Peter Richtárik , Simone Maria Giancola , Dymitr Lubczyk , Robin Yadav

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a…

Machine Learning · Computer Science 2025-01-15 Aaron Mishkin , Ahmed Khaled , Yuanhao Wang , Aaron Defazio , Robert M. Gower

Learning to Accelerate by the Methods of Step-size Planning

Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation…

Machine Learning · Computer Science 2022-05-27 Hengshuai Yao

Convergence of gradient descent for learning linear neural networks

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on…

Machine Learning · Computer Science 2021-11-25 Gabin Maxime Nguegnang , Holger Rauhut , Ulrich Terstiege

In-context Learning and Gradient Descent Revisited

In-context learning (ICL) has shown impressive results in few-shot learning tasks, yet its underlying mechanism is still not fully understood. A recent line of work suggests that ICL performs gradient descent (GD)-based optimization…

Computation and Language · Computer Science 2024-04-02 Gilad Deutch , Nadav Magar , Tomer Bar Natan , Guy Dar

Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show…

Optimization and Control · Mathematics 2025-11-11 Damek Davis , Dmitriy Drusvyatskiy , Liwei Jiang

Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

We present and analyze a novel regularized form of the gradient clipping algorithm, proving that it converges to global minima of the loss surface of deep neural networks under the squared loss, provided that the layers are of sufficient…

Machine Learning · Computer Science 2025-04-09 Matteo Tucat , Anirbit Mukherjee , Procheta Sen , Mingfei Sun , Omar Rivasplata

Stochastic Gradient Descent with Polyak's Learning Rate

Stochastic gradient descent (SGD) for strongly convex functions converges at the rate $\bO(1/k)$. However, achieving good results in practice requires tuning the parameters (for example the learning rate) of the algorithm. In this paper we…

Optimization and Control · Mathematics 2019-07-15 Adam M. Oberman , Mariana Prazeres

Optimization Insights into Deep Diagonal Linear Networks

Gradient-based methods successfully train highly overparameterized models in practice, even though the associated optimization problems are markedly nonconvex. Understanding the mechanisms that make such methods effective has become a…

Machine Learning · Computer Science 2026-01-21 Hippolyte Labarrière , Cesare Molinari , Lorenzo Rosasco , Cristian Vega , Silvia Villa

Faster Biological Gradient Descent Learning

Back-propagation is a popular machine learning algorithm that uses gradient descent in training neural networks for supervised learning, but can be very slow. A number of algorithms have been developed to speed up convergence and improve…

Neural and Evolutionary Computing · Computer Science 2020-09-29 Ho Ling Li

Continuous vs. Discrete Optimization of Deep Neural Networks

Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is…

Machine Learning · Computer Science 2021-12-30 Omer Elkabetz , Nadav Cohen

First order online optimisation using forward gradients in over-parameterised systems

The success of deep learning over the past decade mainly relies on gradient-based optimisation and backpropagation. This paper focuses on analysing the performance of first-order gradient-based optimisation algorithms, gradient descent and…

Optimization and Control · Mathematics 2022-12-08 Behnam Mafakheri , Iman Shames , Jonathan H. Manton

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-\L{}ojasiewicz Condition

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the \L{}ojasiewicz inequality proposed in the same year, and it does not…

Machine Learning · Computer Science 2020-09-15 Hamed Karimi , Julie Nutini , Mark Schmidt

LOSSGRAD: automatic learning rate in gradient descent

In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a…

Machine Learning · Computer Science 2019-11-26 Bartosz Wójcik , Łukasz Maziarka , Jacek Tabor

Graph-Aware Learning Rates for Decentralized Optimization

We propose an adaptive step-size rule for decentralized optimization. Choosing a step-size that balances convergence and stability is challenging. This is amplified in the decentralized setting as agents observe only local (possibly…

Optimization and Control · Mathematics 2026-02-17 Aaron Fainman , Stefan Vlaski

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

Stochastic gradient descent (SGD) has been studied extensively over the past decades due to its simplicity and broad applicability in machine learning. In this work, we analyze the local behavior of gradient descent and stochastic gradient…

Optimization and Control · Mathematics 2026-05-15 Sebastian Kassing , Thomas Kruse

Learning Rate Dropout

The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. However, existing optimization algorithms show a preference for descent…

Computer Vision and Pattern Recognition · Computer Science 2019-12-06 Huangxing Lin , Weihong Zeng , Xinghao Ding , Yue Huang , Chenxi Huang , John Paisley