Related papers: Semi-Implicit Back Propagation

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

A Theoretical Framework for Inference Learning

Backpropagation (BP) is the most successful and widely used algorithm in deep learning. However, the computations required by BP are challenging to reconcile with known neurobiology. This difficulty has stimulated interest in more…

Neural and Evolutionary Computing · Computer Science 2022-06-02 Nick Alonso , Beren Millidge , Jeff Krichmar , Emre Neftci

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. It is based on the finding that gradients from incomplete execution for backpropagation can still effectively train…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Feng Cheng , Mingze Xu , Yuanjun Xiong , Hao Chen , Xinyu Li , Wei Li , Wei Xia

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

Stochastic gradient descent (SGD) has achieved great success in training deep neural network, where the gradient is computed through back-propagation. However, the back-propagated values of different layers vary dramatically. This…

Machine Learning · Statistics 2018-02-28 Huishuai Zhang , Wei Chen , Tie-Yan Liu

Improving Neural Network Training in Low Dimensional Random Bases

Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly…

Machine Learning · Computer Science 2020-11-11 Frithjof Gressmann , Zach Eaton-Rosen , Carlo Luschi

Robust Implicit Backpropagation

Arguably the biggest challenge in applying neural networks is tuning the hyperparameters, in particular the learning rate. The sensitivity to the learning rate is due to the reliance on backpropagation to train the network. In this paper we…

Machine Learning · Statistics 2018-08-08 Francois Fagan , Garud Iyengar

Proximal Backpropagation

We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size…

Machine Learning · Computer Science 2018-02-21 Thomas Frerix , Thomas Möllenhoff , Michael Moeller , Daniel Cremers

An In-depth Study of Stochastic Backpropagation

In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks. During backward propagation, SBP calculates the gradients by…

Computer Vision and Pattern Recognition · Computer Science 2022-10-04 Jun Fang , Mingze Xu , Hao Chen , Bing Shuai , Zhuowen Tu , Joseph Tighe

Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

Training Deep Neural Networks (DNNs) with small batches using Stochastic Gradient Descent (SGD) yields superior test performance compared to larger batches. The specific noise structure inherent to SGD is known to be responsible for this…

Machine Learning · Statistics 2024-02-14 Tom Sander , Maxime Sylvestre , Alain Durmus

SGD with Large Step Sizes Learns Sparse Features

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) lead the iterates to jump from one side of…

Machine Learning · Computer Science 2023-06-08 Maksym Andriushchenko , Aditya Varre , Loucas Pillaud-Vivien , Nicolas Flammarion

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State

Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. However, the supervised training of SNNs remains a hard problem due to the discontinuity of the spiking neuron…

Neural and Evolutionary Computing · Computer Science 2021-12-20 Mingqing Xiao , Qingyan Meng , Zongpeng Zhang , Yisen Wang , Zhouchen Lin

Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks

The Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention. It utilizes binary spike activations to transmit information, thereby replacing multiplications with…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Yufei Guo , Yuanpei Chen , Zecheng Hao , Weihang Peng , Zhou Jie , Yuhan Zhang , Xiaode Liu , Zhe Ma

Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks

The back-propagation (BP) algorithm has been considered the de-facto method for training deep neural networks. It back-propagates errors from the output layer to the hidden layers in an exact manner using the transpose of the feedforward…

Neural and Evolutionary Computing · Computer Science 2018-05-01 Hongyin Luo , Jie Fu , James Glass

Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks

The massive size of modern neural networks has motivated substantial recent interest in neural network quantization. We introduce Stochastic Markov Gradient Descent (SMGD), a discrete optimization method applicable to training quantized…

Machine Learning · Computer Science 2020-12-23 Jonathan Ashbrock , Alexander M. Powell

Noise Balance and Stationary Distribution of Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this…

Machine Learning · Computer Science 2025-06-13 Liu Ziyin , Hongchao Li , Masahito Ueda

Memorized Sparse Backpropagation

Neural network learning is usually time-consuming since backpropagation needs to compute full gradients and backpropagate them across multiple layers. Despite its success of existing works in accelerating propagation through sparseness, the…

Machine Learning · Computer Science 2020-10-28 Zhiyuan Zhang , Pengcheng Yang , Xuancheng Ren , Qi Su , Xu Sun

Direct Feedback Alignment Provides Learning in Deep Neural Networks

Artificial neural networks are most commonly trained with the back-propagation algorithm, where the gradient for learning is provided by back-propagating the error, layer by layer, from the output layer to the hidden layers. A recently…

Machine Learning · Statistics 2016-12-22 Arild Nøkland

Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks

Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems, but they are still trapped in training failures when the target functions to be approximated exhibit…

Machine Learning · Computer Science 2023-03-06 Ye Li , Song-Can Chen , Sheng-Jun Huang

Deep Gradient Boosting -- Layer-wise Input Normalization of Neural Networks

Stochastic gradient descent (SGD) has been the dominant optimization method for training deep neural networks due to its many desirable properties. One of the more remarkable and least understood quality of SGD is that it generalizes…

Machine Learning · Computer Science 2020-07-03 Erhan Bilal