Related papers: On Training Implicit Models

Intelligent gradient amplification for deep neural networks

Deep learning models offer superior performance compared to other machine learning techniques for a variety of tasks and domains, but pose their own challenges. In particular, deep learning models require larger training times as the depth…

Machine Learning · Computer Science 2023-05-31 Sunitha Basodi , Krishna Pusuluri , Xueli Xiao , Yi Pan

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…

Machine Learning · Computer Science 2026-01-14 Katharina Flügel , Daniel Coquelin , Marie Weiel , Charlotte Debus , Achim Streit , Markus Götz

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…

Machine Learning · Computer Science 2022-07-20 David G. T. Barrett , Benoit Dherin

Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks

The optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD)…

Machine Learning · Computer Science 2025-08-04 Xianliang Xu , Ting Du , Wang Kong , Bin Shan , Ye Li , Zhongyi Huang

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between…

Machine Learning · Computer Science 2020-07-15 Edward Moroshko , Suriya Gunasekar , Blake Woodworth , Jason D. Lee , Nathan Srebro , Daniel Soudry

Implicit Reparameterization Gradients

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not…

Machine Learning · Computer Science 2019-01-31 Michael Figurnov , Shakir Mohamed , Andriy Mnih

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

We propose a new technique that boosts the convergence of training generative adversarial networks. Generally, the rate of training deep models reduces severely after multiple iterations. A key reason for this phenomenon is that a deep…

Machine Learning · Statistics 2018-06-15 Atsushi Nitanda , Taiji Suzuki

On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

A deep equilibrium model uses implicit layers, which are implicitly defined through an equilibrium point of an infinite sequence of computation. It avoids any explicit computation of the infinite sequence by finding an equilibrium point…

Machine Learning · Computer Science 2021-02-19 Kenji Kawaguchi

Gradient Estimators for Implicit Models

Implicit models, which allow for the generation of samples but not for point-wise evaluation of probabilities, are omnipresent in real-world problems tackled by machine learning and a hot topic of current research. Some examples include…

Machine Learning · Statistics 2018-04-27 Yingzhen Li , Richard E. Turner

Implicit Gradient Neural Networks with a Positive-Definite Mass Matrix for Online Linear Equations Solving

Motivated by the advantages achieved by implicit analogue net for solving online linear equations, a novel implicit neural model is designed based on conventional explicit gradient neural networks in this letter by introducing a…

Neural and Evolutionary Computing · Computer Science 2017-03-20 Ke Chen

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

Large language models (LLMs) have made fundamental contributions over the last a few years. To train an LLM, one needs to alternatingly run `forward' computations and `backward' computations. The forward computation can be viewed as…

Machine Learning · Computer Science 2024-02-08 Josh Alman , Zhao Song

Gradients as Features for Deep Representation Learning

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…

Machine Learning · Computer Science 2022-02-18 Atılım Güneş Baydin , Barak A. Pearlmutter , Don Syme , Frank Wood , Philip Torr

Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration

Even for the gradient descent (GD) method applied to neural network training, understanding its optimization dynamics, including convergence rate, iterate trajectories, function value oscillations, and especially its implicit acceleration,…

Machine Learning · Computer Science 2026-05-22 Alexander Tyurin

A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique

Deep learning has revolutionized industries like computer vision, natural language processing, and speech recognition. However, back propagation, the main method for training deep neural networks, faces challenges like computational…

Machine Learning · Computer Science 2023-08-15 Gokulprasath R

Continuous Deep Equilibrium Models: Training Neural ODEs faster by integrating them to Infinity

Implicit models separate the definition of a layer from the description of its solution process. While implicit layers allow features such as depth to adapt to new scenarios and inputs automatically, this adaptivity makes its computational…

Machine Learning · Computer Science 2023-03-06 Avik Pal , Alan Edelman , Christopher Rackauckas

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Analyzing Inexact Hypergradients for Bilevel Learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters…

Optimization and Control · Mathematics 2023-11-16 Matthias J. Ehrhardt , Lindon Roberts

Deep Progressive Training: scaling up depth capacity of zero/one-layer models

Model depth is a double-edged sword in deep learning: deeper models achieve higher accuracy but require higher computational cost. To efficiently train models at scale, an effective strategy is the progressive training, which scales up…

Machine Learning · Computer Science 2025-11-10 Zhiqi Bu

On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

Meta-learning that uses implicit gradient have provided an exciting alternative to standard techniques which depend on the trajectory of the inner loop training. Implicit meta-learning (IML), however, require computing $2^{nd}$ order…

Machine Learning · Computer Science 2023-10-31 Fady Rezk