Related papers: Robust Implicit Backpropagation

Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks

Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems, but they are still trapped in training failures when the target functions to be approximated exhibit…

Machine Learning · Computer Science 2023-03-06 Ye Li , Song-Can Chen , Sheng-Jun Huang

Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent

SGD is the widely adopted method to train CNN. Conceptually it approximates the population with a randomly sampled batch; then it evenly trains batches by conducting a gradient update on every batch in an epoch. In this paper, we…

Machine Learning · Computer Science 2017-03-29 Linnan Wang , Yi Yang , Martin Renqiang Min , Srimat Chakradhar

Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks

The optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD)…

Machine Learning · Computer Science 2025-08-04 Xianliang Xu , Ting Du , Wang Kong , Bin Shan , Ye Li , Zhongyi Huang

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

Stochastic gradient descent (SGD) has achieved great success in training deep neural network, where the gradient is computed through back-propagation. However, the back-propagated values of different layers vary dramatically. This…

Machine Learning · Statistics 2018-02-28 Huishuai Zhang , Wei Chen , Tie-Yan Liu

Semi-Implicit Back Propagation

Neural network has attracted great attention for a long time and many researchers are devoted to improve the effectiveness of neural network training algorithms. Though stochastic gradient descent (SGD) and other explicit gradient-based…

Optimization and Control · Mathematics 2020-02-11 Ren Liu , Xiaoqun Zhang

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Equilibrated adaptive learning rates for non-convex optimization

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the…

Machine Learning · Computer Science 2015-09-01 Yann N. Dauphin , Harm de Vries , Yoshua Bengio

Efficient and Effective Implicit Dynamic Graph Neural Network

Implicit graph neural networks have gained popularity in recent years as they capture long-range dependencies while improving predictive performance in static graphs. Despite the tussle between performance degradation due to the…

Machine Learning · Computer Science 2024-06-27 Yongjian Zhong , Hieu Vu , Tianbao Yang , Bijaya Adhikari

Staleness-aware Async-SGD for Distributed Deep Learning

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

Input Normalized Stochastic Gradient Descent Training of Deep Neural Networks

In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive…

Machine Learning · Computer Science 2023-06-28 Salih Atici , Hongyi Pan , Ahmet Enis Cetin

Learning with Local Gradients at the Edge

To enable learning on edge devices with fast convergence and low memory, we present a novel backpropagation-free optimization algorithm dubbed Target Projection Stochastic Gradient Descent (tpSGD). tpSGD generalizes direct random target…

Machine Learning · Computer Science 2022-09-19 Michael Lomnitz , Zachary Daniels , David Zhang , Michael Piacentino

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Improving Neural Network Training in Low Dimensional Random Bases

Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly…

Machine Learning · Computer Science 2020-11-11 Frithjof Gressmann , Zach Eaton-Rosen , Carlo Luschi

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former. We…

Machine Learning · Computer Science 2022-05-06 Shibhansh Dohare , Richard S. Sutton , A. Rupam Mahmood

Faster Biological Gradient Descent Learning

Back-propagation is a popular machine learning algorithm that uses gradient descent in training neural networks for supervised learning, but can be very slow. A number of algorithms have been developed to speed up convergence and improve…

Neural and Evolutionary Computing · Computer Science 2020-09-29 Ho Ling Li

LPGD: A General Framework for Backpropagation through Embedded Optimization Layers

Embedding parameterized optimization problems as layers into machine learning architectures serves as a powerful inductive bias. Training such architectures with stochastic gradient descent requires care, as degenerate derivatives of the…

Machine Learning · Computer Science 2024-12-16 Anselm Paulus , Georg Martius , Vít Musil

A Hybrid Method for Training Convolutional Neural Networks

Artificial Intelligence algorithms have been steadily increasing in popularity and usage. Deep Learning, allows neural networks to be trained using huge datasets and also removes the need for human extracted features, as it automates the…

Neural and Evolutionary Computing · Computer Science 2020-05-11 Vasco Lopes , Paulo Fazendeiro

A Theoretical Framework for Inference Learning

Backpropagation (BP) is the most successful and widely used algorithm in deep learning. However, the computations required by BP are challenging to reconcile with known neurobiology. This difficulty has stimulated interest in more…

Neural and Evolutionary Computing · Computer Science 2022-06-02 Nick Alonso , Beren Millidge , Jeff Krichmar , Emre Neftci

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing…

Machine Learning · Computer Science 2019-12-16 Yunwen Lei , Ting Hu , Guiying Li , Ke Tang

Iterative Implicit Gradients for Nonconvex Optimization with Variational Inequality Constraints

We propose an optimization proxy in terms of iterative implicit gradient methods for solving constrained optimization problems with nonconvex loss functions. This framework can be applied to a broad range of machine learning settings,…

Optimization and Control · Mathematics 2025-10-14 Harshal D. Kaushik , Ming Jin