Related papers: Optimization Methods in Deep Learning: A Comprehen…

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…

Machine Learning · Statistics 2018-05-23 Ashia C. Wilson , Rebecca Roelofs , Mitchell Stern , Nathan Srebro , Benjamin Recht

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

Machine Learning · Computer Science 2026-04-15 Tong Zhang , Jiangning Zhang , Zhucun Xue , Juntao Jiang , Yicheng Xu , Chengming Xu , Teng Hu , Xingyu Xie , Xiaobin Hu , Yabiao Wang , Yong Liu , Shuicheng Yan

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and…

Machine Learning · Computer Science 2021-08-27 Aaron Defazio , Samy Jelassi

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks

Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various…

Machine Learning · Computer Science 2024-08-21 Huixiu Jiang , Ling Yang , Yu Bao , Rutong Si , Sikun Yang

Sharp higher order convergence rates for the Adam optimizer

Gradient descent based optimization methods are the methods of choice to train deep neural networks in machine learning. Beyond the standard gradient descent method, also suitable modified variants of standard gradient descent involving…

Optimization and Control · Mathematics 2025-04-29 Steffen Dereich , Arnulf Jentzen , Adrian Riekert

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared…

Machine Learning · Computer Science 2019-04-22 Liangchen Luo , Yuanhao Xiong , Yan Liu , Xu Sun

A Comparison of Optimization Algorithms for Deep Learning

In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets…

Machine Learning · Computer Science 2020-08-05 Derya Soydaner

Adaptive Gradient Method with Resilience and Momentum

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control…

Machine Learning · Computer Science 2020-10-22 Jie Liu , Chen Lin , Chuming Li , Lu Sheng , Ming Sun , Junjie Yan , Wanli Ouyang

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from…

Machine Learning · Computer Science 2025-02-12 Abulikemu Abuduweili , Changliu Liu

Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks

Despite the omnipresent use of stochastic gradient descent (SGD) optimization methods in the training of deep neural networks (DNNs), it remains, in basically all practically relevant scenarios, a fundamental open problem to provide a…

Machine Learning · Computer Science 2025-03-04 Thang Do , Arnulf Jentzen , Adrian Riekert

Memory-Efficient Adaptive Optimization

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter,…

Machine Learning · Computer Science 2019-09-13 Rohan Anil , Vineet Gupta , Tomer Koren , Yoram Singer

diffGrad: An Optimization Method for Convolutional Neural Networks

Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic…

Machine Learning · Computer Science 2021-11-30 Shiv Ram Dubey , Soumendu Chakraborty , Swalpa Kumar Roy , Snehasis Mukherjee , Satish Kumar Singh , Bidyut Baran Chaudhuri

Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks

This paper deals with nonconvex stochastic optimization problems in deep learning and provides appropriate learning rates with which adaptive learning rate optimization algorithms, such as Adam and AMSGrad, can approximate a stationary…

Optimization and Control · Mathematics 2020-11-24 Hideaki Iiduka

Optimization Methods for Large-Scale Machine Learning

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…

Machine Learning · Statistics 2018-02-12 Léon Bottou , Frank E. Curtis , Jorge Nocedal

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence…

Machine Learning · Computer Science 2024-06-21 Dongruo Zhou , Jinghui Chen , Yuan Cao , Ziyan Yang , Quanquan Gu

Tom: Leveraging trend of the observed gradients for faster convergence

The success of deep learning can be attributed to various factors such as increase in computational power, large datasets, deep convolutional neural networks, optimizers etc. Particularly, the choice of optimizer affects the generalization,…

Machine Learning · Computer Science 2021-09-10 Anirudh Maiya , Inumella Sricharan , Anshuman Pandey , Srinivas K. S

AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs

The stochastic gradient descent (SGD) optimizers are generally used to train the convolutional neural networks (CNNs). In recent years, several adaptive momentum based SGD optimizers have been introduced, such as Adam, diffGrad, Radam and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Shiv Ram Dubey , Satish Kumar Singh , Bidyut Baran Chaudhuri

Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning

The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically…

Machine Learning · Statistics 2017-07-03 Frank E. Curtis , Katya Scheinberg