Related papers: A Control Theoretic Framework for Adaptive Gradien…

AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks…

Machine Learning · Computer Science 2020-12-22 Juntang Zhuang , Tommy Tang , Yifan Ding , Sekhar Tatikonda , Nicha Dvornek , Xenophon Papademetris , James S. Duncan

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…

Machine Learning · Statistics 2018-05-23 Ashia C. Wilson , Rebecca Roelofs , Mitchell Stern , Nathan Srebro , Benjamin Recht

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-06-24 Jinghui Chen , Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , Quanquan Gu

AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs

The stochastic gradient descent (SGD) optimizers are generally used to train the convolutional neural networks (CNNs). In recent years, several adaptive momentum based SGD optimizers have been introduced, such as Adam, diffGrad, Radam and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Shiv Ram Dubey , Satish Kumar Singh , Bidyut Baran Chaudhuri

Adaptive Gradient Method with Resilience and Momentum

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control…

Machine Learning · Computer Science 2020-10-22 Jie Liu , Chen Lin , Chuming Li , Lu Sheng , Ming Sun , Junjie Yan , Wanli Ouyang

Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…

Machine Learning · Computer Science 2021-03-30 Loris Nanni , Gianluca Maguolo , Alessandra Lumini

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared…

Machine Learning · Computer Science 2019-04-22 Liangchen Luo , Yuanhao Xiong , Yan Liu , Xu Sun

On the Convergence of Decentralized Adaptive Gradient Methods

Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization…

Machine Learning · Computer Science 2021-09-08 Xiangyi Chen , Belhal Karimi , Weijie Zhao , Ping Li

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular…

Machine Learning · Computer Science 2019-03-12 Xiangyi Chen , Sijia Liu , Ruoyu Sun , Mingyi Hong

A New Adaptive Gradient Method with Gradient Decomposition

Adaptive gradient methods, especially Adam-type methods (such as Adam, AMSGrad, and AdaBound), have been proposed to speed up the training process with an element-wise scaling term on learning rates. However, they often generalize poorly…

Machine Learning · Computer Science 2021-07-20 Zhou Shao , Tong Lin

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence…

Machine Learning · Computer Science 2024-06-21 Dongruo Zhou , Jinghui Chen , Yuan Cao , Ziyan Yang , Quanquan Gu

Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent…

Machine Learning · Computer Science 2021-10-04 Kushal Chakrabarti , Nikhil Chopra

Rethinking Adam: A Twofold Exponential Moving Average Approach

Adaptive gradient methods, e.g. \textsc{Adam}, have achieved tremendous success in machine learning. Scaling the learning rate element-wisely by a certain form of second moment estimate of gradients, such methods are able to attain rapid…

Machine Learning · Computer Science 2022-02-10 Yizhou Wang , Yue Kang , Can Qin , Huan Wang , Yi Xu , Yulun Zhang , Yun Fu

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also…

Optimization and Control · Mathematics 2022-05-13 Feihu Huang , Junyi Li , Heng Huang

Adam: A Method for Stochastic Optimization

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has…

Machine Learning · Computer Science 2017-01-31 Diederik P. Kingma , Jimmy Ba

On the Convergence of AdaBound and its Connection to SGD

Adaptive gradient methods such as Adam have gained extreme popularity due to their success in training complex neural networks and less sensitivity to hyperparameter tuning compared to SGD. However, it has been recently shown that Adam can…

Machine Learning · Computer Science 2019-12-11 Pedro Savarese

Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM

Adaptive gradient methods (AGMs) have become popular in optimizing the nonconvex problems in deep learning area. We revisit AGMs and identify that the adaptive learning rate (A-LR) used by AGMs varies significantly across the dimensions of…

Machine Learning · Computer Science 2019-09-12 Qianqian Tong , Guannan Liang , Jinbo Bi

diffGrad: An Optimization Method for Convolutional Neural Networks

Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic…

Machine Learning · Computer Science 2021-11-30 Shiv Ram Dubey , Soumendu Chakraborty , Swalpa Kumar Roy , Snehasis Mukherjee , Satish Kumar Singh , Bidyut Baran Chaudhuri

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a…

Machine Learning · Computer Science 2021-08-26 Difan Zou , Yuan Cao , Yuanzhi Li , Quanquan Gu

AdaSGD: Bridging the gap between SGD and Adam

In the context of stochastic gradient descent(SGD) and adaptive moment estimation (Adam),researchers have recently proposed optimization techniques that transition from Adam to SGD with the goal of improving both convergence and…

Machine Learning · Computer Science 2020-07-01 Jiaxuan Wang , Jenna Wiens