Related papers: Gradient-based Hyperparameter Optimization through…

Gradient Descent: The Ultimate Optimizer

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model…

Machine Learning · Computer Science 2022-10-18 Kartik Chandra , Audrey Xie , Jonathan Ragan-Kelley , Erik Meijer

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror…

Machine Learning · Statistics 2017-12-13 Luca Franceschi , Michele Donini , Paolo Frasconi , Massimiliano Pontil

Gradient Descent with Provably Tuned Learning-rate Schedules

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma

Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters

Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model.…

Machine Learning · Computer Science 2016-06-20 Jelena Luketina , Mathias Berglund , Klaus Greff , Tapani Raiko

Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam)…

Machine Learning · Computer Science 2024-09-19 Abel C. H. Chen

Analyzing Inexact Hypergradients for Bilevel Learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters…

Optimization and Control · Mathematics 2023-11-16 Matthias J. Ehrhardt , Lindon Roberts

Practical recommendations for gradient-based training of deep architectures

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of…

Machine Learning · Computer Science 2012-09-18 Yoshua Bengio

AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning

Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter…

Machine Learning · Computer Science 2022-03-17 Krishnateja Killamsetty , Guttu Sai Abhishek , Aakriti , Alexandre V. Evfimievski , Lucian Popa , Ganesh Ramakrishnan , Rishabh Iyer

Hyperparameter optimization with approximate gradient

Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we…

Machine Learning · Statistics 2022-11-22 Fabian Pedregosa

Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus

Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model…

Machine Learning · Computer Science 2023-11-27 Gabriele Maroni , Loris Cannelli , Dario Piga

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…

Machine Learning · Computer Science 2026-01-14 Katharina Flügel , Daniel Coquelin , Marie Weiel , Charlotte Debus , Achim Streit , Markus Götz

Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…

Machine Learning · Computer Science 2022-02-18 Atılım Güneş Baydin , Barak A. Pearlmutter , Don Syme , Frank Wood , Philip Torr

Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation

Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable…

Machine Learning · Computer Science 2022-04-22 Ross M. Clarke , Elre T. Oldewage , José Miguel Hernández-Lobato

On the Iteration Complexity of Hypergradient Computation

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include…

Machine Learning · Statistics 2020-07-13 Riccardo Grazzi , Luca Franceschi , Massimiliano Pontil , Saverio Salzo

A Note on Uncertainty Quantification for Maximum Likelihood Parameters Estimated with Heuristic Based Optimization Algorithms

Gradient-based solvers risk convergence to local optima, leading to incorrect researcher inference. Heuristic-based algorithms are able to ``break free" of these local optima to eventually converge to the true global optimum. However, given…

Econometrics · Economics 2024-01-17 Zachary Porreca

Learning to Learn without Gradient Descent by Gradient Descent

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad…

Machine Learning · Statistics 2017-06-13 Yutian Chen , Matthew W. Hoffman , Sergio Gomez Colmenarejo , Misha Denil , Timothy P. Lillicrap , Matt Botvinick , Nando de Freitas