Related papers: Efficient Per-Example Gradient Computations

Efficient Per-Example Gradient Computations in Convolutional Neural Networks

Deep learning frameworks leverage GPUs to perform massively-parallel computations over batches of many training examples efficiently. However, for certain tasks, one may be interested in performing per-example computations, for instance…

Machine Learning · Computer Science 2020-11-17 Gaspar Rochette , Andre Manoel , Eric W. Tramel

Gradient Estimation Using Stochastic Computation Graphs

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external…

Machine Learning · Computer Science 2016-01-06 John Schulman , Nicolas Heess , Theophane Weber , Pieter Abbeel

Smart Gradient -- An Adaptive Technique for Improving Gradient Estimation

Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that require gradients are…

Numerical Analysis · Mathematics 2022-06-09 Esmail Abdul Fattah , Janet Van Niekerk , Haavard Rue

A suitable similarity index for comparing learnt neural networks plays an important role in understanding the behaviour of the highly-nonlinear functions, and can provide insights on further theoretical analysis and empirical studies. We…

Machine Learning · Computer Science 2020-03-26 Shuai Tang , Wesley J. Maddox , Charlie Dickens , Tom Diethe , Andreas Damianou

Gradients as Features for Deep Representation Learning

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Regression Trees Know Calculus

Regression trees have emerged as a preeminent tool for solving real-world regression problems due to their ability to deal with nonlinearities, interaction effects and sharp discontinuities. In this article, we rather study regression trees…

Machine Learning · Statistics 2025-11-14 Nathan Wycoff

Convergence of gradient flow for learning convolutional neural networks

Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as…

Optimization and Control · Mathematics 2026-01-14 Jona-Maria Diederen , Holger Rauhut , Ulrich Terstiege

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Online Importance Sampling for Stochastic Gradient Optimization

Machine learning optimization often depends on stochastic gradient descent, where the precision of gradient estimation is vital for model performance. Gradients are calculated from mini-batches formed by uniformly selecting data samples…

Machine Learning · Computer Science 2025-01-29 Corentin Salaün , Xingchang Huang , Iliyan Georgiev , Niloy J. Mitra , Gurprit Singh

LossVal: Efficient Data Valuation for Neural Networks

Assessing the importance of individual training samples is a key challenge in machine learning. Traditional approaches retrain models with and without specific samples, which is computationally expensive and ignores dependencies between…

Machine Learning · Computer Science 2024-12-18 Tim Wibiral , Mohamed Karim Belaid , Maximilian Rabus , Ansgar Scherp

On the rate of convergence of a neural network regression estimate learned by gradient descent

Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical $L_2$ risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient…

Statistics Theory · Mathematics 2019-12-10 Alina Braun , Michael Kohler , Harro Walk

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…

Machine Learning · Computer Science 2026-01-14 Katharina Flügel , Daniel Coquelin , Marie Weiel , Charlotte Debus , Achim Streit , Markus Götz

Analysis of Natural Gradient Descent for Multilayer Neural Networks

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods…

Disordered Systems and Neural Networks · Physics 2009-10-31 Magnus Rattray , David Saad

LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of…

Machine Learning · Computer Science 2020-08-04 Rahul Yedida , Snehanshu Saha , Tejas Prashanth

Channel-Directed Gradients for Optimization of Convolutional Neural Networks

We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic…

Machine Learning · Computer Science 2020-08-26 Dong Lao , Peihao Zhu , Peter Wonka , Ganesh Sundaramoorthi

GradMetaNet: An Equivariant Architecture for Learning on Gradients

Gradients of neural networks encode valuable information for optimization, editing, and analysis of models. Therefore, practitioners often treat gradients as inputs to task-specific algorithms, e.g. for pruning or optimization. Recent works…

Machine Learning · Computer Science 2025-10-14 Yoav Gelberg , Yam Eitan , Aviv Navon , Aviv Shamsian , Theo , Putterman , Michael Bronstein , Haggai Maron

Enhancing approximation abilities of neural networks by training derivatives

A method to increase the precision of feedforward networks is proposed. It requires a prior knowledge of a target function derivatives of several orders and uses this information in gradient based training. Forward pass calculates not only…

Neural and Evolutionary Computing · Computer Science 2020-04-08 V. I. Avrutskiy

Variational Neural Networks: Every Layer and Neuron Can Be Unique

The choice of activation function can significantly influence the performance of neural networks. The lack of guiding principles for the selection of activation function is lamentable. We try to address this issue by introducing our…

Machine Learning · Computer Science 2018-10-16 Yiwei Li , Enzhi Li

Approximation and Gradient Descent Training with Neural Networks

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training…

Machine Learning · Computer Science 2024-05-21 G. Welper

Not All Samples Are Created Equal: Deep Learning with Importance Sampling

Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on…

Machine Learning · Computer Science 2019-10-29 Angelos Katharopoulos , François Fleuret