Related papers: Conjugate-Gradient-like Based Adaptive Moment Esti…

Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning

This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis. Numerical experiments on text classification and image classification show that the…

Optimization and Control · Mathematics 2020-03-04 Yu Kobayashi , Hideaki Iiduka

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from…

Machine Learning · Computer Science 2025-02-12 Abulikemu Abuduweili , Changliu Liu

Rethinking Adam: A Twofold Exponential Moving Average Approach

Adaptive gradient methods, e.g. \textsc{Adam}, have achieved tremendous success in machine learning. Scaling the learning rate element-wisely by a certain form of second moment estimate of gradients, such methods are able to attain rapid…

Machine Learning · Computer Science 2022-02-10 Yizhou Wang , Yue Kang , Can Qin , Huan Wang , Yi Xu , Yulun Zhang , Yun Fu

Adaptive Moment Estimation Optimization Algorithm Using Projection Gradient for Deep Learning

Training deep neural networks is challenging. To accelerate training and enhance performance, we propose PadamP, a novel optimization algorithm. PadamP is derived by applying the adaptive estimation of the p-th power of the second-order…

Optimization and Control · Mathematics 2025-03-14 Yongqi Li , Xiaowei Zhang

Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Recent progress on deep learning relies heavily on the quality and efficiency of training algorithms. In this paper, we develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework. We propose the Conjugate…

Machine Learning · Computer Science 2021-07-28 Zhiyong Hao , Yixuan Jiang , Huihua Yu , Hsiao-Dong Chiang

On Higher-order Moments in Adam

In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension…

Machine Learning · Computer Science 2019-10-16 Zhanhong Jiang , Aditya Balu , Sin Yong Tan , Young M Lee , Chinmay Hegde , Soumik Sarkar

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-06-24 Jinghui Chen , Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , Quanquan Gu

Rapidly Adapting Moment Estimation

Adaptive gradient methods such as Adam have been shown to be very effective for training deep neural networks (DNNs) by tracking the second moment of gradients to compute the individual learning rates. Differently from existing methods, we…

Machine Learning · Computer Science 2019-02-26 Guoqiang Zhang , Kenta Niwa , W. Bastiaan Kleijn

Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular…

Machine Learning · Computer Science 2019-03-12 Xiangyi Chen , Sijia Liu , Ruoyu Sun , Mingyi Hong

UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization

Adam-type algorithms have become a preferred choice for optimisation in the deep learning setting, however, despite success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type…

Machine Learning · Computer Science 2024-09-24 Yiming Jiang , Jinlan Liu , Dongpo Xu , Danilo P. Mandic

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks. However, the choice of the optimizer has become an ad-hoc rule that can significantly affect the performance.…

Machine Learning · Computer Science 2020-10-21 Samy Jelassi , Aaron Defazio

Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

In the training of neural networks, adaptive moment estimation (Adam) typically converges fast but exhibits suboptimal generalization performance. A widely accepted explanation for its defect in generalization is that it often tends to…

Machine Learning · Computer Science 2026-03-10 Tao Shi , Liangming Chen , Long Jin , Mengchu Zhou

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates…

Machine Learning · Computer Science 2026-05-18 Manuel Graca , L. Miguel Silveira , Arlindo Oliveira , Frank Liu

GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization

Deep neural network learning can be formulated as a non-convex optimization problem. Existing optimization algorithms, e.g., Adam, can learn the models fast, but may get stuck in local optima easily. In this paper, we introduce a novel…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang , Fisher B. Gouza

A New Adaptive Gradient Method with Gradient Decomposition

Adaptive gradient methods, especially Adam-type methods (such as Adam, AMSGrad, and AdaBound), have been proposed to speed up the training process with an element-wise scaling term on learning rates. However, they often generalize poorly…

Machine Learning · Computer Science 2021-07-20 Zhou Shao , Tong Lin

A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD

Adaptive Moment Estimation (Adam) is a cornerstone optimization algorithm in deep learning, widely recognized for its flexibility with adaptive learning rates and efficiency in handling large-scale data. However, despite its practical…

Machine Learning · Computer Science 2025-05-21 Ruinan Jin , Xiao Li , Yaoliang Yu , Baoxiang Wang

DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in…

Machine Learning · Computer Science 2019-05-30 Parvin Nazari , Davoud Ataee Tarzanagh , George Michailidis

Moment Centralization based Gradient Descent Optimizers for Convolutional Neural Networks

Convolutional neural networks (CNNs) have shown very appealing performance for many computer vision applications. The training of CNNs is generally performed using stochastic gradient descent (SGD) based optimization techniques. The…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Sumanth Sadu , Shiv Ram Dubey , SR Sreeja

Adam$^+$: A Stochastic Method with Adaptive Variance Reduction

Adam is a widely used stochastic optimization method for deep learning applications. While practitioners prefer Adam because it requires less parameter tuning, its use is problematic from a theoretical point of view since it may not…

Machine Learning · Computer Science 2020-11-25 Mingrui Liu , Wei Zhang , Francesco Orabona , Tianbao Yang