Related papers: Adaptive Optimization with Examplewise Gradients

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally. However, there is also a global learning rate which must be tuned in order to get the best performance. In this paper, we present a…

Machine Learning · Computer Science 2018-06-12 Hiroaki Hayashi , Jayanth Koushik , Graham Neubig

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from…

Machine Learning · Computer Science 2025-02-12 Abulikemu Abuduweili , Changliu Liu

Rethinking Adam: A Twofold Exponential Moving Average Approach

Adaptive gradient methods, e.g. \textsc{Adam}, have achieved tremendous success in machine learning. Scaling the learning rate element-wisely by a certain form of second moment estimate of gradients, such methods are able to attain rapid…

Machine Learning · Computer Science 2022-02-10 Yizhou Wang , Yue Kang , Can Qin , Huan Wang , Yi Xu , Yulun Zhang , Yun Fu

On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods

Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing due to the capability on…

Machine Learning · Computer Science 2020-12-17 Bingxin Zhou , Xuebin Zheng , Junbin Gao

Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

Adam: A Method for Stochastic Optimization

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has…

Machine Learning · Computer Science 2017-01-31 Diederik P. Kingma , Jimmy Ba

Improving the Adaptive Moment Estimation (ADAM) stochastic optimizer through an Implicit-Explicit (IMEX) time-stepping approach

The Adam optimizer, often used in Machine Learning for neural network training, corresponds to an underlying ordinary differential equation (ODE) in the limit of very small learning rates. This work shows that the classical Adam algorithm…

Computational Engineering, Finance, and Science · Computer Science 2024-09-17 Abhinab Bhattacharjee , Andrey A. Popov , Arash Sarshar , Adrian Sandu

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-06-24 Jinghui Chen , Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , Quanquan Gu

Stochastic Gradient Sampling for Enhancing Neural Networks Training

In this paper, we introduce StochGradAdam, a novel optimizer designed as an extension of the Adam algorithm, incorporating stochastic gradient sampling techniques to improve computational efficiency while maintaining robust performance.…

Machine Learning · Computer Science 2025-03-19 Juyoung Yun

Variational Stochastic Gradient Descent for Deep Neural Networks

Current state-of-the-art optimizers are adaptive gradient-based optimization methods such as Adam. Recently, there has been an increasing interest in formulating gradient-based optimizers in a probabilistic framework for better modeling the…

Machine Learning · Computer Science 2025-04-21 Haotian Chen , Anna Kuzina , Babak Esmaeili , Jakub M Tomczak

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of…

Optimization and Control · Mathematics 2020-06-15 Xunpeng Huang , Runxin Xu , Hao Zhou , Zhe Wang , Zhengyang Liu , Lei Li

A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Adaptive gradient methods have become popular in optimizing deep neural networks; recent examples include AdaGrad and Adam. Although Adam usually converges faster, variations of Adam, for instance, the AdaBelief algorithm, have been…

Machine Learning · Computer Science 2024-10-29 Kushal Chakrabarti , Nikhil Chopra

We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

In the rapidly advancing field of deep learning, optimising deep neural networks is paramount. This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct…

Machine Learning · Computer Science 2023-08-22 Afshin Khadangi

Memory-Efficient Adaptive Optimization

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter,…

Machine Learning · Computer Science 2019-09-13 Rohan Anil , Vineet Gupta , Tomer Koren , Yoram Singer

An Adaptive Gradient Method with Energy and Momentum

We introduce a novel algorithm for gradient-based optimization of stochastic objective functions. The method may be seen as a variant of SGD with momentum equipped with an adaptive learning rate automatically adjusted by an 'energy'…

Optimization and Control · Mathematics 2022-03-24 Hailiang Liu , Xuping Tian

BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization

In this paper, a new gradient-based optimization approach by automatically adjusting the learning rate is proposed. This approach can be applied to design non-adaptive learning rate and adaptive learning rate. Firstly, I will introduce the…

Machine Learning · Computer Science 2022-07-07 Xin Cao

Step-size Adaptation Using Exponentiated Gradient Updates

Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale…

Machine Learning · Computer Science 2022-02-02 Ehsan Amid , Rohan Anil , Christopher Fifty , Manfred K. Warmuth

Adam$^+$: A Stochastic Method with Adaptive Variance Reduction

Adam is a widely used stochastic optimization method for deep learning applications. While practitioners prefer Adam because it requires less parameter tuning, its use is problematic from a theoretical point of view since it may not…

Machine Learning · Computer Science 2020-11-25 Mingrui Liu , Wei Zhang , Francesco Orabona , Tianbao Yang

Rapidly Adapting Moment Estimation

Adaptive gradient methods such as Adam have been shown to be very effective for training deep neural networks (DNNs) by tracking the second moment of gradients to compute the individual learning rates. Differently from existing methods, we…

Machine Learning · Computer Science 2019-02-26 Guoqiang Zhang , Kenta Niwa , W. Bastiaan Kleijn