Related papers: TAdam: A Robust Stochastic Gradient Optimizer

Stochastic Gradient Sampling for Enhancing Neural Networks Training

In this paper, we introduce StochGradAdam, a novel optimizer designed as an extension of the Adam algorithm, incorporating stochastic gradient sampling techniques to improve computational efficiency while maintaining robust performance.…

Machine Learning · Computer Science 2025-03-19 Juyoung Yun

Adam: A Method for Stochastic Optimization

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has…

Machine Learning · Computer Science 2017-01-31 Diederik P. Kingma , Jimmy Ba

CAdam: Confidence-Based Optimization for Online Learning

Modern recommendation systems frequently employ online learning to dynamically update their models with freshly collected data. The most commonly used optimizer for updating neural networks in these contexts is the Adam optimizer, which…

Machine Learning · Computer Science 2025-06-05 Shaowen Wang , Anan Liu , Jian Xiao , Huan Liu , Yuekui Yang , Cong Xu , Qianqian Pu , Suncong Zheng , Wei Zhang , Di Wang , Jie Jiang , Jian Li

AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient Optimization

With the increasing practicality of deep learning applications, practitioners are inevitably faced with datasets corrupted by noise from various sources such as measurement errors, mislabeling, and estimated surrogate inputs/outputs that…

Machine Learning · Computer Science 2023-08-30 Wendyam Eric Lionel Ilboudo , Taisuke Kobayashi , Takamitsu Matsubara

On Higher-order Moments in Adam

In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension…

Machine Learning · Computer Science 2019-10-16 Zhanhong Jiang , Aditya Balu , Sin Yong Tan , Young M Lee , Chinmay Hegde , Soumik Sarkar

Tom: Leveraging trend of the observed gradients for faster convergence

The success of deep learning can be attributed to various factors such as increase in computational power, large datasets, deep convolutional neural networks, optimizers etc. Particularly, the choice of optimizer affects the generalization,…

Machine Learning · Computer Science 2021-09-10 Anirudh Maiya , Inumella Sricharan , Anshuman Pandey , Srinivas K. S

Towards Deep Robot Learning with Optimizer applicable to Non-stationary Problems

This paper proposes a new optimizer for deep learning, named d-AmsGrad. In the real-world data, noise and outliers cannot be excluded from dataset to be used for learning robot skills. This problem is especially striking for robots that…

Machine Learning · Computer Science 2021-04-02 Taisuke Kobayashi

Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate

First-order optimization algorithms have been proven prominent in deep learning. In particular, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of ``long-term memory" in Adam-like…

Machine Learning · Computer Science 2020-12-01 Haiwen Huang , Chang Wang , Bin Dong

SADAM: Stochastic Adam, A Stochastic Operator for First-Order Gradient-based Optimizer

In this work, to efficiently help escape the stationary and saddle points, we propose, analyze, and generalize a stochastic strategy performed as an operator for a first-order gradient descent algorithm in order to increase the target…

Machine Learning · Computer Science 2022-05-23 Wei Zhang , Yu Bao

An Improved Adaptive PID Optimizer with Enhanced Convergence and Stability for Deep Learning

Optimization is essential in deep learning. The foundational method upon which most optimizers are built is momentum-based stochastic gradient descent. However, it suffers from two key drawbacks. First, it has noisy and varying gradients,…

Machine Learning · Computer Science 2026-05-22 Saurabh Saini , Kapil Ahuja , Thomas Wick , Saurav Kumar

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

The Adaptive Momentum Estimation (Adam) algorithm is highly effective in training various deep learning tasks. Despite this, there's limited theoretical understanding for Adam, especially when focusing on its vanilla form in non-convex…

Optimization and Control · Mathematics 2025-02-25 Yusu Hong , Junhong Lin

A Theoretical and Experimental Study of a Novel Adaptive Learning Algorithm

A crucial component of machine learning algorithms is minimizing loss functions with less computational cost and less oscillations. While adaptive learning rate-based optimizers have been widely used for real-world tasks, they do not…

Machine Learning · Computer Science 2026-05-29 Sakshi Kumari , Shyam Kumar M , Sushmitha P

Improved Performance of Stochastic Gradients with Gaussian Smoothing

This paper formalizes and analyzes Gaussian smoothing applied to two prominent optimization methods: Stochastic Gradient Descent (GSmoothSGD) and Adam (GSmoothAdam) in deep learning. By attenuating small fluctuations, Gaussian smoothing…

Optimization and Control · Mathematics 2024-11-19 Andrew Starnes , Clayton Webster

A Physics-Inspired Optimizer: Velocity Regularized Adam

We introduce Velocity-Regularized Adam (VRAdam), a physics-inspired optimizer for training deep neural networks that draws on ideas from quartic terms for kinetic energy with its stabilizing effects on various system dynamics. Previous…

Machine Learning · Computer Science 2026-05-13 Pranav Vaidhyanathan , Lucas Schorling , Natalia Ares , Michael A. Osborne

ANO : Faster is Better in Noisy Landscape

Stochastic optimizers are central to deep learning, yet widely used methods such as Adam and Adan can degrade in non-stationary or noisy environments, partly due to their reliance on momentum-based magnitude estimates. We introduce Ano, a…

Machine Learning · Computer Science 2025-11-11 Adrien Kegreisz

Lookahead Optimizer: k steps forward, 1 step back

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate…

Machine Learning · Computer Science 2019-12-04 Michael R. Zhang , James Lucas , Geoffrey Hinton , Jimmy Ba

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce…

Machine Learning · Computer Science 2024-06-18 Kaan Ozkara , Can Karakus , Parameswaran Raman , Mingyi Hong , Shoham Sabach , Branislav Kveton , Volkan Cevher

Divergence Results and Convergence of a Variance Reduced Version of ADAM

Stochastic optimization algorithms using exponential moving averages of the past gradients, such as ADAM, RMSProp and AdaGrad, have been having great successes in many applications, especially in training deep neural networks. ADAM in…

Machine Learning · Computer Science 2026-01-30 Ruiqi Wang , Diego Klabjan

Adam$^+$: A Stochastic Method with Adaptive Variance Reduction

Adam is a widely used stochastic optimization method for deep learning applications. While practitioners prefer Adam because it requires less parameter tuning, its use is problematic from a theoretical point of view since it may not…

Machine Learning · Computer Science 2020-11-25 Mingrui Liu , Wei Zhang , Francesco Orabona , Tianbao Yang

EAdam Optimizer: How $\epsilon$ Impact Adam

Many adaptive optimization methods have been proposed and used in deep learning, in which Adam is regarded as the default algorithm and widely used in many deep learning frameworks. Recently, many variants of Adam, such as Adabound, RAdam…

Machine Learning · Computer Science 2020-11-05 Wei Yuan , Kai-Xin Gao