Related papers: Tricks from Deep Learning

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural…

Machine Learning · Computer Science 2019-08-30 Fei Wang , Daniel Zheng , James Decker , Xilun Wu , Grégory M. Essertel , Tiark Rompf

Randomized Automatic Differentiation

The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD…

Machine Learning · Computer Science 2021-03-16 Deniz Oktay , Nick McGreivy , Joshua Aduol , Alex Beatson , Ryan P. Adams

Peering Beyond the Gradient Veil with Distributed Auto Differentiation

Although distributed machine learning has opened up many new and exciting research frontiers, fragmentation of models and data across different machines, nodes, and sites still results in considerable communication overhead, impeding…

Machine Learning · Computer Science 2022-02-04 Bradley T. Baker , Aashis Khanal , Vince D. Calhoun , Barak Pearlmutter , Sergey M. Plis

A Brief Introduction to Automatic Differentiation for Machine Learning

Machine learning and neural network models in particular have been improving the state of the art performance on many artificial intelligence related tasks. Neural network models are typically implemented using frameworks that perform…

Machine Learning · Computer Science 2021-10-18 Davan Harrison

Stable and efficient differentiation of tensor network algorithms

Gradient based optimization methods are the established state-of-the-art paradigm to study strongly entangled quantum systems in two dimensions with Projected Entangled Pair States. However, the key ingredient, the gradient itself, has…

Quantum Physics · Physics 2025-04-15 Anna Francuz , Norbert Schuch , Bram Vanhecke

Learning Hidden Dynamics using Intelligent Automatic Differentiation

Many engineering problems involve learning hidden dynamics from indirect observations, where the physical processes are described by systems of partial differential equations (PDE). Gradient-based optimization methods are considered…

Numerical Analysis · Mathematics 2019-12-17 Kailai Xu , Dongzhuo Li , Eric Darve , Jerry M. Harris

The simple essence of automatic differentiation

Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep…

Programming Languages · Computer Science 2018-10-03 Conal Elliott

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets

An important class of problems involves training deep neural networks with sparse prediction targets of very high dimension D. These occur naturally in e.g. neural language models or the learning of word-embeddings, often posed as…

Neural and Evolutionary Computing · Computer Science 2015-07-15 Pascal Vincent , Alexandre de Brébisson , Xavier Bouthillier

Adaptive Pruning of Deep Neural Networks for Resource-Aware Embedded Intrusion Detection on the Edge

Artificial neural network pruning is a method in which artificial neural network sizes can be reduced while attempting to preserve the predicting capabilities of the network. This is done to make the model smaller or faster during inference…

Machine Learning · Computer Science 2025-05-21 Alexandre Broggi , Nathaniel Bastian , Lance Fiondella , Gokhan Kul

Computation of Generalized Derivatives for Abs-Smooth Functions by Backward Mode Algorithmic Differentiation and Implications to Deep Learning

Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools…

Optimization and Control · Mathematics 2024-12-02 Lukas Baumgärtner , Franz Bethke

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks

Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various…

Machine Learning · Computer Science 2024-08-21 Huixiu Jiang , Ling Yang , Yu Bao , Rutong Si , Sikun Yang

DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing

Automatic differentiation (AD) is a set of techniques that systematically applies the chain rule to compute the gradients of functions without requiring human intervention. Although the fundamentals of this technology were established…

Machine Learning · Computer Science 2025-09-03 Afif Boudaoud , Alexandru Calotoiu , Marcin Copik , Torsten Hoefler

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently…

Machine Learning · Computer Science 2024-10-16 Yuntian Gu , Xuzheng Chen

Deep Learning Advancements in Anomaly Detection: A Comprehensive Survey

The rapid expansion of data from diverse sources has made anomaly detection (AD) increasingly essential for identifying unexpected observations that may signal system failures, security breaches, or fraud. As datasets become more complex…

Machine Learning · Computer Science 2025-03-18 Haoqi Huang , Ping Wang , Jianhua Pei , Jiacheng Wang , Shahen Alexanian , Dusit Niyato

Learning Gradient Descent: Better Generalization and Longer Horizons

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

TAdam: A Robust Stochastic Gradient Optimizer

Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when…

Machine Learning · Computer Science 2020-03-04 Wendyam Eric Lionel Ilboudo , Taisuke Kobayashi , Kenji Sugimoto

Differentiable Programming Tensor Networks

Differentiable programming is a fresh programming paradigm which composes parameterized algorithmic components and trains them using automatic differentiation (AD). The concept emerges from deep learning but is not only limited to training…

Strongly Correlated Electrons · Physics 2019-09-11 Hai-Jun Liao , Jin-Guo Liu , Lei Wang , Tao Xiang

Improving Neural Network Training in Low Dimensional Random Bases

Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly…

Machine Learning · Computer Science 2020-11-11 Frithjof Gressmann , Zach Eaton-Rosen , Carlo Luschi

Automatic differentiation in machine learning: a survey

Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more…

Symbolic Computation · Computer Science 2018-07-18 Atilim Gunes Baydin , Barak A. Pearlmutter , Alexey Andreyevich Radul , Jeffrey Mark Siskind