Related papers: Denotationally Correct, Purely Functional, Efficie…
We show how to define forward- and reverse-mode automatic differentiation source-code transformations or on a standard higher-order functional language. The transformations generate purely functional code, and they are principled in the…
Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…
How does one compile derivatives of tensor programs, such that the resulting code is purely functional (hence easier to optimize and parallelize) and provably efficient relative to the original program? We show that naively differentiating…
Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the…
Building on the observation that reverse-mode automatic differentiation (AD) -- a generalisation of backpropagation -- can naturally be expressed as pullbacks of differential 1-forms, we design a simple higher-order programming language…
We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror…
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and…
We present semantic correctness proofs of forward-mode Automatic Differentiation (AD) for languages with sources of partiality such as partial operations, lazy conditionals on real parameters, iteration, and term and type recursion. We…
Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…
Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…
The Cartesian reverse derivative is a categorical generalization of reverse-mode automatic differentiation. We use this operator to generalize several optimization algorithms, including a straightforward generalization of gradient descent…
Tools for algorithmic differentiation (AD) provide accurate derivatives of computer-implemented functions for use in, e. g., optimization and machine learning (ML). However, they often require the source code of the function to be available…
We decompose reverse-mode automatic differentiation into (forward-mode) linearization followed by transposition. Doing so isolates the essential difference between forward- and reverse-mode AD, and simplifies their joint implementation. In…
The standard dual-numbers construction works well for forward-mode automatic differentiation (AD) and is attractive due to its simplicity; recently, it also has been adapted to reverse-mode AD, but practical performance, especially on array…
Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most…
The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way…
While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations.…
Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a…
Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently,…
This paper can be seen as an attempt of rethinking the {\em Extra-Gradient Philosophy} for solving Variational Inequality Problems. We show that the properly defined {\em Reduced Gradients} can be used instead for finding approximate…