Related papers: Gradients without Backpropagation

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…

Machine Learning · Computer Science 2026-01-14 Katharina Flügel , Daniel Coquelin , Marie Weiel , Charlotte Debus , Achim Streit , Markus Götz

Towards Scalable Backpropagation-Free Gradient Estimation

While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations.…

Machine Learning · Computer Science 2025-11-06 Daniel Wang , Evan Markou , Dylan Campbell

Can Forward Gradient Match Backpropagation?

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient…

Machine Learning · Computer Science 2023-06-13 Louis Fournier , Stéphane Rivaud , Eugene Belilovsky , Michael Eickenberg , Edouard Oyallon

Backpropagation in the Simply Typed Lambda-calculus with Linear Negation

Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most…

Logic in Computer Science · Computer Science 2019-11-07 Alois Brunel , Damiano Mazza , Michele Pagani

Optimization without Backpropagation

Forward gradients have been recently introduced to bypass backpropagation in autodifferentiation, while retaining unbiased estimators of true gradients. We derive an optimality condition to obtain best approximating forward gradients, which…

Machine Learning · Computer Science 2022-09-15 Gabriel Belouze

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror…

Machine Learning · Statistics 2017-12-13 Luca Franceschi , Michele Donini , Paolo Frasconi , Massimiliano Pontil

First order online optimisation using forward gradients in over-parameterised systems

The success of deep learning over the past decade mainly relies on gradient-based optimisation and backpropagation. This paper focuses on analysing the performance of first-order gradient-based optimisation algorithms, gradient descent and…

Optimization and Control · Mathematics 2022-12-08 Behnam Mafakheri , Iman Shames , Jonathan H. Manton

Gradient Descent: The Ultimate Optimizer

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model…

Machine Learning · Computer Science 2022-10-18 Kartik Chandra , Audrey Xie , Jonathan Ragan-Kelley , Erik Meijer

Automatic Differentiation of Algorithms for Machine Learning

Automatic differentiation---the mechanical transformation of numeric computer programs to calculate derivatives efficiently and accurately---dates to the origin of the computer age. Reverse mode automatic differentiation both antedates and…

Machine Learning · Computer Science 2014-04-30 Atilim Gunes Baydin , Barak A. Pearlmutter

A New Backpropagation Algorithm without Gradient Descent

The backpropagation algorithm, which had been originally introduced in the 1970s, is the workhorse of learning in neural networks. This backpropagation algorithm makes use of the famous machine learning algorithm known as Gradient Descent,…

Machine Learning · Computer Science 2018-02-02 Varun Ranganathan , S. Natarajan

Backpropagation generalized for output derivatives

Backpropagation algorithm is the cornerstone for neural network analysis. Paper extends it for training any derivatives of neural network's output with respect to its input. By the dint of it feedforward networks can be used to solve or…

Neural and Evolutionary Computing · Computer Science 2017-12-13 V. I. Avrutskiy

Gradient-based Hyperparameter Optimization through Reversible Learning

Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the…

Machine Learning · Statistics 2015-04-03 Dougal Maclaurin , David Duvenaud , Ryan P. Adams

Denotationally Correct, Purely Functional, Efficient Reverse-mode Automatic Differentiation

Reverse-mode differentiation is used for optimization, but it introduces references, which break the purity of the underlying programs, making them notoriously harder to optimize. We present a reverse-mode differentiation on a purely…

Programming Languages · Computer Science 2023-04-27 Mathieu Huot , Amir Shaikhha

Proximal Backpropagation

We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size…

Machine Learning · Computer Science 2018-02-21 Thomas Frerix , Thomas Möllenhoff , Michael Moeller , Daniel Cremers

Computation of Generalized Derivatives for Abs-Smooth Functions by Backward Mode Algorithmic Differentiation and Implications to Deep Learning

Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools…

Optimization and Control · Mathematics 2024-12-02 Lukas Baumgärtner , Franz Bethke

The simple essence of automatic differentiation

Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…

Programming Languages · Computer Science 2018-10-03 Conal Elliott

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…

Machine Learning · Computer Science 2019-08-30 Fei Wang , Daniel Zheng , James Decker , Xilun Wu , Grégory M. Essertel , Tiark Rompf

Faster Biological Gradient Descent Learning

Back-propagation is a popular machine learning algorithm that uses gradient descent in training neural networks for supervised learning, but can be very slow. A number of algorithms have been developed to speed up convergence and improve…

Neural and Evolutionary Computing · Computer Science 2020-09-29 Ho Ling Li

Analyzing Inexact Hypergradients for Bilevel Learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters…

Optimization and Control · Mathematics 2023-11-16 Matthias J. Ehrhardt , Lindon Roberts

On the complexity of nonsmooth automatic differentiation

Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the…

Numerical Analysis · Mathematics 2023-02-07 Jérôme Bolte , Ryan Boustany , Edouard Pauwels , Béatrice Pesquet-Popescu