Related papers: Gradients without Backpropagation
The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…
While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations.…
Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient…
Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most…
Forward gradients have been recently introduced to bypass backpropagation in autodifferentiation, while retaining unbiased estimators of true gradients. We derive an optimality condition to obtain best approximating forward gradients, which…
We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror…
The success of deep learning over the past decade mainly relies on gradient-based optimisation and backpropagation. This paper focuses on analysing the performance of first-order gradient-based optimisation algorithms, gradient descent and…
Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model…
Automatic differentiation---the mechanical transformation of numeric computer programs to calculate derivatives efficiently and accurately---dates to the origin of the computer age. Reverse mode automatic differentiation both antedates and…
The backpropagation algorithm, which had been originally introduced in the 1970s, is the workhorse of learning in neural networks. This backpropagation algorithm makes use of the famous machine learning algorithm known as Gradient Descent,…
Backpropagation algorithm is the cornerstone for neural network analysis. Paper extends it for training any derivatives of neural network's output with respect to its input. By the dint of it feedforward networks can be used to solve or…
Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the…
Reverse-mode differentiation is used for optimization, but it introduces references, which break the purity of the underlying programs, making them notoriously harder to optimize. We present a reverse-mode differentiation on a purely…
We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size…
Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools…
Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…
Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…
Back-propagation is a popular machine learning algorithm that uses gradient descent in training neural networks for supervised learning, but can be very slow. A number of algorithms have been developed to speed up convergence and improve…
Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters…
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the…