English
Related papers

Related papers: Forward and Reverse Gradient-Based Hyperparameter …

200 papers

Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the…

Machine Learning · Statistics 2015-04-03 Dougal Maclaurin , David Duvenaud , Ryan P. Adams

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…

Machine Learning · Computer Science 2022-02-18 Atılım Güneş Baydin , Barak A. Pearlmutter , Don Syme , Frank Wood , Philip Torr

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient…

Machine Learning · Computer Science 2023-06-13 Louis Fournier , Stéphane Rivaud , Eugene Belilovsky , Michael Eickenberg , Edouard Oyallon

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters…

Optimization and Control · Mathematics 2023-11-16 Matthias J. Ehrhardt , Lindon Roberts

The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way…

Machine Learning · Computer Science 2016-04-07 Jie Fu , Hongyin Luo , Jiashi Feng , Kian Hsiang Low , Tat-Seng Chua

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…

Machine Learning · Computer Science 2026-01-14 Katharina Flügel , Daniel Coquelin , Marie Weiel , Charlotte Debus , Achim Streit , Markus Götz

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam)…

Machine Learning · Computer Science 2024-09-19 Abel C. H. Chen

While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations.…

Machine Learning · Computer Science 2025-11-06 Daniel Wang , Evan Markou , Dylan Campbell

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include…

Machine Learning · Statistics 2020-07-13 Riccardo Grazzi , Luca Franceschi , Massimiliano Pontil , Saverio Salzo

In standard neural network training, the gradients in the backward pass are determined by the forward pass. As a result, the two stages are coupled. This is how most neural networks are trained currently. However, gradient modification in…

Machine Learning · Computer Science 2022-09-21 Bishshoy Das , Milton Mondal , Brejesh Lall , Shiv Dutt Joshi , Sumantra Dutta Roy

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient…

Machine Learning · Computer Science 2021-10-01 Paul Micaelli , Amos Storkey

Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that require gradients are…

Numerical Analysis · Mathematics 2022-06-09 Esmail Abdul Fattah , Janet Van Niekerk , Haavard Rue

Supervised learning in deep neural networks is commonly performed using error backpropagation. However, the sequential propagation of errors during the backward pass limits its scalability and applicability to low-powered neuromorphic…

Machine Learning · Computer Science 2023-08-22 Florian Bacho , Dominique Chu

Bilevel optimization is a powerful tool for many machine learning problems, such as hyperparameter optimization and meta-learning. Estimating hypergradients (also known as implicit gradients) is crucial for developing gradient-based methods…

Optimization and Control · Mathematics 2025-05-06 Youran Dong , Junfeng Yang , Wei Yao , Jin Zhang

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model…

Machine Learning · Computer Science 2022-10-18 Kartik Chandra , Audrey Xie , Jonathan Ragan-Kelley , Erik Meijer

Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, gradient methods are widely used in…

Machine Learning · Statistics 2017-08-16 Alexander Jung

How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal…

Computer Vision and Pattern Recognition · Computer Science 2021-07-13 Mateusz Malinowski , Dimitrios Vytiniotis , Grzegorz Swirszcz , Viorica Patraucean , Joao Carreira
‹ Prev 1 2 3 10 Next ›