Related papers: Zero Coordinate Shift: Whetted Automatic Different…

DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing

Automatic differentiation (AD) is a set of techniques that systematically applies the chain rule to compute the gradients of functions without requiring human intervention. Although the fundamentals of this technology were established…

Machine Learning · Computer Science 2025-09-03 Afif Boudaoud , Alexandru Calotoiu , Marcin Copik , Torsten Hoefler

Training Artificial Neural Networks by Coordinate Search Algorithm

Training Artificial Neural Networks poses a challenging and critical problem in machine learning. Despite the effectiveness of gradient-based learning methods, such as Stochastic Gradient Descent (SGD), in training neural networks, they do…

Machine Learning · Computer Science 2024-02-21 Ehsan Rokhsatyazdi , Shahryar Rahnamayan , Sevil Zanjani Miyandoab , Azam Asilian Bidgoli , H. R. Tizhoosh

Coordinate Descent with Online Adaptation of Coordinate Frequencies

Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine…

Machine Learning · Statistics 2014-01-16 Tobias Glasmachers , Ürün Dogan

Automatic differentiation in machine learning: a survey

Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more…

Symbolic Computation · Computer Science 2018-07-18 Atilim Gunes Baydin , Barak A. Pearlmutter , Alexey Andreyevich Radul , Jeffrey Mark Siskind

Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization

Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning and black-box optimization. However, existing methods face a stark trade-off: they are either…

Machine Learning · Computer Science 2026-05-29 Chen Liang , Xiatao Sun , Qian Wang , Daniel Rakita

Automatic differentiation for solid mechanics

Automatic differentiation (AD) is an ensemble of techniques that allow to evaluate accurate numerical derivatives of a mathematical function expressed in a computer programming language. In this paper we use AD for stating and solving solid…

Numerical Analysis · Mathematics 2020-01-22 Andrea Vigliotti , Ferdinando Auricchio

Performance Portable Gradient Computations Using Source Transformation

Derivative computation is a key component of optimization, sensitivity analysis, uncertainty quantification, and nonlinear solvers. Automatic differentiation (AD) is a powerful technique for evaluating such derivatives, and in recent years,…

Mathematical Software · Computer Science 2025-07-18 Kim Liegeois , Brian Kelley , Eric Phipps , Sivasankaran Rajamanickam , Vassil Vassilev

Auto-Precision Scaling for Distributed Deep Learning

It has been reported that the communication cost for synchronizing gradients can be a bottleneck, which limits the scalability of distributed deep learning. Using low-precision gradients is a promising technique for reducing the bandwidth…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-18 Ruobing Han , James Demmel , Yang You

Randomized Automatic Differentiation

The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD…

Machine Learning · Computer Science 2021-03-16 Deniz Oktay , Nick McGreivy , Joshua Aduol , Alex Beatson , Ryan P. Adams

Automatic Differentiation in ROOT

In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how…

Mathematical Software · Computer Science 2021-02-03 Vassil Vassilev , Aleksandr Efremov , Oksana Shadura

Zero-shifting Technique for Deep Neural Network Training on Resistive Cross-point Arrays

A resistive memory device-based computing architecture is one of the promising platforms for energy-efficient Deep Neural Network (DNN) training accelerators. The key technical challenge in realizing such accelerators is to accumulate the…

Emerging Technologies · Computer Science 2019-08-05 Hyungjun Kim , Malte Rasch , Tayfun Gokmen , Takashi Ando , Hiroyuki Miyazoe , Jae-Joon Kim , John Rozen , Seyoung Kim

Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations

Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One…

Machine Learning · Computer Science 2025-03-19 Chuqi Chen , Yahong Yang , Yang Xiang , Wenrui Hao

The simple essence of automatic differentiation

Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…

Programming Languages · Computer Science 2018-10-03 Conal Elliott

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…

Machine Learning · Computer Science 2019-08-30 Fei Wang , Daniel Zheng , James Decker , Xilun Wu , Grégory M. Essertel , Tiark Rompf

Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes

Change-point detection (CPD) is crucial for identifying abrupt shifts in data, which influence decision-making and efficient resource allocation across various domains. To address the challenges posed by the costly and time-intensive data…

Machine Learning · Computer Science 2023-12-07 Hao Zhao , Rong Pan

Computation of Generalized Derivatives for Abs-Smooth Functions by Backward Mode Algorithmic Differentiation and Implications to Deep Learning

Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools…

Optimization and Control · Mathematics 2024-12-02 Lukas Baumgärtner , Franz Bethke

Seeing the Whole Picture: Distribution-Guided Data-Free Distillation for Semantic Segmentation

Semantic segmentation requires a holistic understanding of the physical world, as it assigns semantic labels to spatially continuous and structurally coherent objects rather than to isolated pixels. However, existing data-free knowledge…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Hongxuan Sun , Tao Wu

Learning Hidden Dynamics using Intelligent Automatic Differentiation

Many engineering problems involve learning hidden dynamics from indirect observations, where the physical processes are described by systems of partial differential equations (PDE). Gradient-based optimization methods are considered…

Numerical Analysis · Mathematics 2019-12-17 Kailai Xu , Dongzhuo Li , Eric Darve , Jerry M. Harris

Peering Beyond the Gradient Veil with Distributed Auto Differentiation

Although distributed machine learning has opened up many new and exciting research frontiers, fragmentation of models and data across different machines, nodes, and sites still results in considerable communication overhead, impeding…

Machine Learning · Computer Science 2022-02-04 Bradley T. Baker , Aashis Khanal , Vince D. Calhoun , Barak Pearlmutter , Sergey M. Plis

Zero-Shot Learning from scratch (ZFS): leveraging local compositional representations

Zero-shot classification is a generalization task where no instance from the target classes is seen during training. To allow for test-time transfer, each class is annotated with semantic information, commonly in the form of attributes or…

Computer Vision and Pattern Recognition · Computer Science 2020-10-27 Tristan Sylvain , Linda Petrini , R Devon Hjelm