Related papers: Randomized Automatic Differentiation

The simple essence of automatic differentiation

Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…

Programming Languages · Computer Science 2018-10-03 Conal Elliott

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…

Machine Learning · Computer Science 2019-08-30 Fei Wang , Daniel Zheng , James Decker , Xilun Wu , Grégory M. Essertel , Tiark Rompf

Learning Hidden Dynamics using Intelligent Automatic Differentiation

Many engineering problems involve learning hidden dynamics from indirect observations, where the physical processes are described by systems of partial differential equations (PDE). Gradient-based optimization methods are considered…

Numerical Analysis · Mathematics 2019-12-17 Kailai Xu , Dongzhuo Li , Eric Darve , Jerry M. Harris

Automatic differentiation in machine learning: a survey

Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more…

Symbolic Computation · Computer Science 2018-07-18 Atilim Gunes Baydin , Barak A. Pearlmutter , Alexey Andreyevich Radul , Jeffrey Mark Siskind

A Brief Introduction to Automatic Differentiation for Machine Learning

Machine learning and neural network models in particular have been improving the state of the art performance on many artificial intelligence related tasks. Neural network models are typically implemented using frameworks that perform…

Machine Learning · Computer Science 2021-10-18 Davan Harrison

DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing

Automatic differentiation (AD) is a set of techniques that systematically applies the chain rule to compute the gradients of functions without requiring human intervention. Although the fundamentals of this technology were established…

Machine Learning · Computer Science 2025-09-03 Afif Boudaoud , Alexandru Calotoiu , Marcin Copik , Torsten Hoefler

Tricks from Deep Learning

The deep learning community has devised a diverse set of methods to make gradient optimization, using large datasets, of large and highly complex models with deeply cascaded nonlinearities, practical. Taken as a whole, these methods…

Machine Learning · Computer Science 2016-11-14 Atılım Güneş Baydin , Barak A. Pearlmutter , Jeffrey Mark Siskind

Automatic Differentiation of Programs with Discrete Randomness

Automatic differentiation (AD), a technique for constructing new programs which compute the derivative of an original program, has become ubiquitous throughout scientific computing and deep learning due to the improved performance afforded…

Machine Learning · Computer Science 2023-01-10 Gaurav Arya , Moritz Schauer , Frank Schäfer , Chris Rackauckas

Stable and efficient differentiation of tensor network algorithms

Gradient based optimization methods are the established state-of-the-art paradigm to study strongly entangled quantum systems in two dimensions with Projected Entangled Pair States. However, the key ingredient, the gradient itself, has…

Quantum Physics · Physics 2025-04-15 Anna Francuz , Norbert Schuch , Bram Vanhecke

Differentiable Agent-Based Simulation for Gradient-Guided Simulation-Based Optimization

Simulation-based optimization using agent-based models is typically carried out under the assumption that the gradient describing the sensitivity of the simulation output to the input cannot be evaluated directly. To still apply…

Machine Learning · Computer Science 2021-03-24 Philipp Andelfinger

Conformal Symplectic Optimization for Stable Reinforcement Learning

Training deep reinforcement learning (RL) agents necessitates overcoming the highly unstable nonconvex stochastic optimization inherent in the trial-and-error mechanism. To tackle this challenge, we propose a physics-inspired optimization…

Machine Learning · Computer Science 2024-12-10 Yao Lyu , Xiangteng Zhang , Shengbo Eben Li , Jingliang Duan , Letian Tao , Qing Xu , Lei He , Keqiang Li

Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We examine such structured problems which also depend on a parameter vector and study the problem of…

Optimization and Control · Mathematics 2024-10-28 Sheheryar Mehmood , Peter Ochs

Learning in Integer Latent Variable Models with Nested Automatic Differentiation

We develop nested automatic differentiation (AD) algorithms for exact inference and learning in integer latent variable models. Recently, Winner, Sujono, and Sheldon showed how to reduce marginalization in a class of integer latent variable…

Machine Learning · Statistics 2018-06-11 Daniel Sheldon , Kevin Winner , Debora Sujono

Peering Beyond the Gradient Veil with Distributed Auto Differentiation

Although distributed machine learning has opened up many new and exciting research frontiers, fragmentation of models and data across different machines, nodes, and sites still results in considerable communication overhead, impeding…

Machine Learning · Computer Science 2022-02-04 Bradley T. Baker , Aashis Khanal , Vince D. Calhoun , Barak Pearlmutter , Sergey M. Plis

Storchastic: A Framework for General Stochastic Automatic Differentiation

Modelers use automatic differentiation (AD) of computation graphs to implement complex Deep Learning models without defining gradient computations. Stochastic AD extends AD to stochastic computation graphs with sampling steps, which arise…

Machine Learning · Statistics 2021-10-27 Emile van Krieken , Jakub M. Tomczak , Annette ten Teije

Computation of Generalized Derivatives for Abs-Smooth Functions by Backward Mode Algorithmic Differentiation and Implications to Deep Learning

Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools…

Optimization and Control · Mathematics 2024-12-02 Lukas Baumgärtner , Franz Bethke

DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks

The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way…

Machine Learning · Computer Science 2016-04-07 Jie Fu , Hongyin Luo , Jiashi Feng , Kian Hsiang Low , Tat-Seng Chua

Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations

Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One…

Machine Learning · Computer Science 2025-03-19 Chuqi Chen , Yahong Yang , Yang Xiang , Wenrui Hao

Efficient and Sound Differentiable Programming in a Functional Array-Processing Language

Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and…

Programming Languages · Computer Science 2022-12-21 Amir Shaikhha , Mathieu Huot , Shabnam Ghasemirad , Andrew Fitzgibbon , Simon Peyton Jones , Dimitrios Vytiniotis

Scalable Adaptive Stochastic Optimization Using Random Projections

Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by…

Machine Learning · Statistics 2016-11-22 Gabriel Krummenacher , Brian McWilliams , Yannic Kilcher , Joachim M. Buhmann , Nicolai Meinshausen