Related papers: Gradient Descent: The Ultimate Optimizer

Gradient-based Hyperparameter Optimization through Reversible Learning

Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the…

Machine Learning · Statistics 2015-04-03 Dougal Maclaurin , David Duvenaud , Ryan P. Adams

Gradient Descent with Provably Tuned Learning-rate Schedules

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma

Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam)…

Machine Learning · Computer Science 2024-09-19 Abel C. H. Chen

Analyzing Inexact Hypergradients for Bilevel Learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters…

Optimization and Control · Mathematics 2023-11-16 Matthias J. Ehrhardt , Lindon Roberts

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

Optimizing ML Training with Metagradient Descent

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…

Machine Learning · Statistics 2025-03-19 Logan Engstrom , Andrew Ilyas , Benjamin Chen , Axel Feldmann , William Moses , Aleksander Madry

Adaptive Conditional Gradient Descent

Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on…

Optimization and Control · Mathematics 2025-10-14 Abbas Khademi , Antonio Silveti-Falls

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…

Machine Learning · Computer Science 2022-02-18 Atılım Güneş Baydin , Barak A. Pearlmutter , Don Syme , Frank Wood , Philip Torr

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Gradient descent with momentum --- to accelerate or to super-accelerate?

We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current…

Machine Learning · Computer Science 2020-01-20 Goran Nakerst , John Brennan , Masudul Haque

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization

We introduce a machine-learning framework to learn the hyperparameter sequence of first-order methods (e.g., the step sizes in gradient descent) to quickly solve parametric convex optimization problems. Our computational architecture…

Optimization and Control · Mathematics 2024-12-23 Rajiv Sambharya , Bartolomeo Stellato

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…

Machine Learning · Computer Science 2026-01-14 Katharina Flügel , Daniel Coquelin , Marie Weiel , Charlotte Debus , Achim Streit , Markus Götz

LOSSGRAD: automatic learning rate in gradient descent

In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a…

Machine Learning · Computer Science 2019-11-26 Bartosz Wójcik , Łukasz Maziarka , Jacek Tabor

Truncated Back-propagation for Bilevel Optimization

Bilevel optimization has been recently revisited for designing and analyzing algorithms in hyperparameter tuning and meta learning tasks. However, due to its nested structure, evaluating exact gradients for high-dimensional problems is…

Machine Learning · Computer Science 2019-04-09 Amirreza Shaban , Ching-An Cheng , Nathan Hatch , Byron Boots

Proximal Backpropagation

We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size…

Machine Learning · Computer Science 2018-02-21 Thomas Frerix , Thomas Möllenhoff , Michael Moeller , Daniel Cremers

Learning complexity of gradient descent and conjugate gradient algorithms

Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed…

Optimization and Control · Mathematics 2024-12-19 Xianqi Jiao , Jia Liu , Zhiping Chen

L4: Practical loss-based stepsize adaptation for deep learning

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by…

Machine Learning · Computer Science 2018-12-03 Michal Rolinek , Georg Martius

Fine-Tuning Adaptive Stochastic Optimizers: Determining the Optimal Hyperparameter $\epsilon$ via Gradient Magnitude Histogram Analysis

Stochastic optimizers play a crucial role in the successful training of deep neural network models. To achieve optimal model performance, designers must carefully select both model and optimizer hyperparameters. However, this process is…

Machine Learning · Computer Science 2024-09-17 Gustavo Silva , Paul Rodriguez