English
Related papers

Related papers: Gradient Correction beyond Gradient Descent

200 papers

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural…

Machine Learning · Computer Science 2023-08-17 Adam N. McCaughan , Bakhrom G. Oripov , Natesh Ganesh , Sae Woo Nam , Andrew Dienstfrey , Sonia M. Buckley

The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In…

Machine Learning · Computer Science 2019-04-30 Dmitrii Marin , Meng Tang , Ismail Ben Ayed , Yuri Boykov

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve…

Machine Learning · Computer Science 2021-05-17 Siddhartha Satpathi , R Srikant

Deep neural networks (DNN) are typically optimized using stochastic gradient descent (SGD). However, the estimation of the gradient using stochastic samples tends to be noisy and unreliable, resulting in large gradient variance and bad…

Machine Learning · Computer Science 2021-05-18 Xingyi Yang

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Stochastic gradient descent (SGD) has achieved great success in training deep neural network, where the gradient is computed through back-propagation. However, the back-propagated values of different layers vary dramatically. This…

Machine Learning · Statistics 2018-02-28 Huishuai Zhang , Wei Chen , Tie-Yan Liu

In the last decade, deep learning has become a major component of artificial intelligence. The workhorse of deep learning is the optimization of loss functions by stochastic gradient descent (SGD). Traditionally in deep learning, neural…

Machine Learning · Computer Science 2021-04-27 Benjamin Scellier

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…

Machine Learning · Statistics 2025-03-19 Logan Engstrom , Andrew Ilyas , Benjamin Chen , Axel Feldmann , William Moses , Aleksander Madry

Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD)…

Machine Learning · Computer Science 2020-10-30 Bohang Zhang , Jikai Jin , Cong Fang , Liwei Wang

Stochastic gradient descent (SGD) has been the dominant optimization method for training deep neural networks due to its many desirable properties. One of the more remarkable and least understood quality of SGD is that it generalizes…

Machine Learning · Computer Science 2020-07-03 Erhan Bilal

Even for the gradient descent (GD) method applied to neural network training, understanding its optimization dynamics, including convergence rate, iterate trajectories, function value oscillations, and especially its implicit acceleration,…

Machine Learning · Computer Science 2026-05-22 Alexander Tyurin

The training of machine learning models is typically carried out using some form of gradient descent, often with great success. However, non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness…

Machine Learning · Computer Science 2024-06-18 Thomas Flynn

Deep learning is typically performed by learning a neural network solely from data in the form of input-output pairs ignoring available domain knowledge. In this work, the Constraint Guided Gradient Descent (CGGD) framework is proposed that…

Artificial Intelligence · Computer Science 2022-06-15 Quinten Van Baelen , Peter Karsmakers

Gradient descent has been a central training principle for artificial neural networks from the early beginnings to today's deep learning networks. The most common implementation is the backpropagation algorithm for training feed-forward…

Machine Learning · Computer Science 2020-06-09 Stefan Jaeger

We study the complexity of training neural network models with one hidden nonlinear activation layer and an output weighted sum layer. We analyze Gradient Descent applied to learning a bounded target function on $n$ real-valued inputs. We…

Machine Learning · Computer Science 2019-05-28 Santosh Vempala , John Wilmes

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

The optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD)…

Machine Learning · Computer Science 2025-08-04 Xianliang Xu , Ting Du , Wang Kong , Bin Shan , Ye Li , Zhongyi Huang

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell
‹ Prev 1 2 3 10 Next ›