Related papers: Denotationally Correct, Purely Functional, Efficie…

Reverse AD at Higher Types: Pure, Principled and Denotationally Correct

We show how to define forward- and reverse-mode automatic differentiation source-code transformations or on a standard higher-order functional language. The transformations generate purely functional code, and they are principled in the…

Programming Languages · Computer Science 2021-01-25 Matthijs Vákár

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…

Machine Learning · Computer Science 2022-02-18 Atılım Güneş Baydin , Barak A. Pearlmutter , Don Syme , Frank Wood , Philip Torr

Differentiating a Tensor Language

How does one compile derivatives of tensor programs, such that the resulting code is purely functional (hence easier to optimize and parallelize) and provably efficient relative to the original program? We show that naively differentiating…

Programming Languages · Computer Science 2020-10-01 Gilbert Bernstein , Michael Mara , Tzu-Mao Li , Dougal Maclaurin , Jonathan Ragan-Kelley

A Simple Differentiable Programming Language

Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the…

Programming Languages · Computer Science 2020-02-04 Martin Abadi , Gordon D. Plotkin

A Differential-form Pullback Programming Language for Higher-order Reverse-mode Automatic Differentiation

Building on the observation that reverse-mode automatic differentiation (AD) -- a generalisation of backpropagation -- can naturally be expressed as pullbacks of differential 1-forms, we design a simple higher-order programming language…

Programming Languages · Computer Science 2020-02-20 Carol Mak , Luke Ong

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror…

Machine Learning · Statistics 2017-12-13 Luca Franceschi , Michele Donini , Paolo Frasconi , Massimiliano Pontil

Efficient and Sound Differentiable Programming in a Functional Array-Processing Language

Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and…

Programming Languages · Computer Science 2022-12-21 Amir Shaikhha , Mathieu Huot , Shabnam Ghasemirad , Andrew Fitzgibbon , Simon Peyton Jones , Dimitrios Vytiniotis

Denotational Correctness of Forward-Mode Automatic Differentiation for Iteration and Recursion

We present semantic correctness proofs of forward-mode Automatic Differentiation (AD) for languages with sources of partiality such as partial operations, lazy conditionals on real parameters, iteration, and term and type recursion. We…

Programming Languages · Computer Science 2024-05-28 Matthijs Vákár

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…

Machine Learning · Computer Science 2019-08-30 Fei Wang , Daniel Zheng , James Decker , Xilun Wu , Grégory M. Essertel , Tiark Rompf

The simple essence of automatic differentiation

Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…

Programming Languages · Computer Science 2018-10-03 Conal Elliott

Generalized Optimization: A First Step Towards Category Theoretic Learning Theory

The Cartesian reverse derivative is a categorical generalization of reverse-mode automatic differentiation. We use this operator to generalize several optimization algorithms, including a straightforward generalization of gradient descent…

Optimization and Control · Mathematics 2021-09-22 Dan Shiebler

Reverse-Mode Automatic Differentiation of Compiled Programs

Tools for algorithmic differentiation (AD) provide accurate derivatives of computer-implemented functions for use in, e. g., optimization and machine learning (ML). However, they often require the source code of the function to be available…

Mathematical Software · Computer Science 2022-12-29 Max Aehle , Johannes Blühdorn , Max Sagebaum , Nicolas R. Gauger

Decomposing reverse-mode automatic differentiation

We decompose reverse-mode automatic differentiation into (forward-mode) linearization followed by transposition. Doing so isolates the essential difference between forward- and reverse-mode AD, and simplifies their joint implementation. In…

Programming Languages · Computer Science 2021-05-21 Roy Frostig , Matthew J. Johnson , Dougal Maclaurin , Adam Paszke , Alexey Radul

Dual-Numbers Reverse AD for Functional Array Languages

The standard dual-numbers construction works well for forward-mode automatic differentiation (AD) and is attractive due to its simplicity; recently, it also has been adapted to reverse-mode AD, but practical performance, especially on array…

Programming Languages · Computer Science 2025-07-18 Tom Smeding , Mikołaj Konarski , Simon Peyton Jones , Andrew Fitzgibbon

Backpropagation in the Simply Typed Lambda-calculus with Linear Negation

Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most…

Logic in Computer Science · Computer Science 2019-11-07 Alois Brunel , Damiano Mazza , Michele Pagani

DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks

The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way…

Machine Learning · Computer Science 2016-04-07 Jie Fu , Hongyin Luo , Jiashi Feng , Kian Hsiang Low , Tat-Seng Chua

Towards Scalable Backpropagation-Free Gradient Estimation

While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations.…

Machine Learning · Computer Science 2025-11-06 Daniel Wang , Evan Markou , Dylan Campbell

Efficient Learning of Generative Models via Finite-Difference Score Matching

Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a…

Machine Learning · Computer Science 2020-11-26 Tianyu Pang , Kun Xu , Chongxuan Li , Yang Song , Stefano Ermon , Jun Zhu

Efficient and Modular Implicit Differentiation

Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently,…

Machine Learning · Computer Science 2022-10-13 Mathieu Blondel , Quentin Berthet , Marco Cuturi , Roy Frostig , Stephan Hoyer , Felipe Llinares-López , Fabian Pedregosa , Jean-Philippe Vert

High-Order Reduced-Gradient Methods for Composite Variational Inequalities

This paper can be seen as an attempt of rethinking the {\em Extra-Gradient Philosophy} for solving Variational Inequality Problems. We show that the properly defined {\em Reduced Gradients} can be used instead for finding approximate…

Optimization and Control · Mathematics 2023-12-05 Yurii Nesterov