English
Related papers

Related papers: Adam: A Method for Stochastic Optimization

200 papers

In this work, to efficiently help escape the stationary and saddle points, we propose, analyze, and generalize a stochastic strategy performed as an operator for a first-order gradient descent algorithm in order to increase the target…

Machine Learning · Computer Science 2022-05-23 Wei Zhang , Yu Bao

Adam is a widely used stochastic optimization method for deep learning applications. While practitioners prefer Adam because it requires less parameter tuning, its use is problematic from a theoretical point of view since it may not…

Machine Learning · Computer Science 2020-11-25 Mingrui Liu , Wei Zhang , Francesco Orabona , Tianbao Yang

Although adaptive optimization algorithms such as Adam show fast convergence in many machine learning tasks, this paper identifies a problem of Adam by analyzing its performance in a simple non-convex synthetic problem, showing that Adam's…

Machine Learning · Computer Science 2020-05-06 Wenjie Li , Zhaoyang Zhang , Xinjiang Wang , Ping Luo

Adam is a popular variant of stochastic gradient descent for finding a local minimizer of a function. In the constant stepsize regime, assuming that the objective function is differentiable and non-convex, we establish the convergence in…

Machine Learning · Statistics 2020-05-15 Anas Barakat , Pascal Bianchi

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from…

Machine Learning · Computer Science 2025-02-12 Abulikemu Abuduweili , Changliu Liu

In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension…

Machine Learning · Computer Science 2019-10-16 Zhanhong Jiang , Aditya Balu , Sin Yong Tan , Young M Lee , Chinmay Hegde , Soumik Sarkar

Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-dimensional optimization to local and even global…

Machine Learning · Computer Science 2025-12-02 Meng Zhu , Quan Xiao , Weidong Min

Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when…

Machine Learning · Computer Science 2020-03-04 Wendyam Eric Lionel Ilboudo , Taisuke Kobayashi , Kenji Sugimoto

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving…

Machine Learning · Computer Science 2019-04-22 Sashank J. Reddi , Satyen Kale , Sanjiv Kumar

Adam-type algorithms have become a preferred choice for optimisation in the deep learning setting, however, despite success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type…

Machine Learning · Computer Science 2024-09-24 Yiming Jiang , Jinlan Liu , Dongpo Xu , Danilo P. Mandic

In this paper, we introduce StochGradAdam, a novel optimizer designed as an extension of the Adam algorithm, incorporating stochastic gradient sampling techniques to improve computational efficiency while maintaining robust performance.…

Machine Learning · Computer Science 2025-03-19 Juyoung Yun

This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular…

Machine Learning · Computer Science 2019-03-12 Xiangyi Chen , Sijia Liu , Ruoyu Sun , Mingyi Hong

In this paper, we study the convergence of the Adaptive Moment Estimation (Adam) algorithm under unconstrained non-convex smooth stochastic optimizations. Despite the widespread usage in machine learning areas, its theoretical properties…

Optimization and Control · Mathematics 2023-11-06 Yusu Hong , Junhong Lin

Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in…

Machine Learning · Computer Science 2019-05-30 Parvin Nazari , Davoud Ataee Tarzanagh , George Michailidis

Stochastic optimization algorithms using exponential moving averages of the past gradients, such as ADAM, RMSProp and AdaGrad, have been having great successes in many applications, especially in training deep neural networks. ADAM in…

Machine Learning · Computer Science 2026-01-30 Ruiqi Wang , Diego Klabjan

Adaptive gradient methods have become popular in optimizing deep neural networks; recent examples include AdaGrad and Adam. Although Adam usually converges faster, variations of Adam, for instance, the AdaBelief algorithm, have been…

Machine Learning · Computer Science 2024-10-29 Kushal Chakrabarti , Nikhil Chopra

In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. Despite the popularity and efficiency of the Adam algorithm in training deep neural…

Optimization and Control · Mathematics 2023-11-08 Haochuan Li , Alexander Rakhlin , Ali Jadbabaie

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of…

An algorithm is presented for momentum gradient descent optimization based on the first-order differential equation of the Newtonian dynamics. The fictitious mass is introduced to the dynamics of momentum for regularizing the adaptive…

Machine Learning · Computer Science 2018-05-15 Zhidong Han

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter,…

Machine Learning · Computer Science 2019-09-13 Rohan Anil , Vineet Gupta , Tomer Koren , Yoram Singer
‹ Prev 1 2 3 10 Next ›