Related papers: Compositional ADAM: An Adaptive Compositional Solv…

Local Convergence of Adaptive Gradient Descent Optimizers

Adaptive Moment Estimation (ADAM) is a very popular training algorithm for deep neural networks and belongs to the family of adaptive gradient descent optimizers. However to the best of the authors knowledge no complete convergence analysis…

Machine Learning · Computer Science 2021-02-22 Sebastian Bock , Martin Georg Weiß

Learning compositional functions via multiplicative weight updates

Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful…

Neural and Evolutionary Computing · Computer Science 2021-01-11 Jeremy Bernstein , Jiawei Zhao , Markus Meister , Ming-Yu Liu , Anima Anandkumar , Yisong Yue

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also…

Optimization and Control · Mathematics 2022-05-13 Feihu Huang , Junyi Li , Heng Huang

Estimation for Compositional Data using Measurements from Nonlinear Systems using Artificial Neural Networks

Our objective is to estimate the unknown compositional input from its output response through an unknown system after estimating the inverse of the original system with a training set. The proposed methods using artificial neural networks…

Machine Learning · Computer Science 2020-01-27 Se Un Park

Differentiable Linearized ADMM

Recently, a number of learning-based optimization methods that combine data-driven architectures with the classical optimization algorithms have been proposed and explored, showing superior empirical performance in solving various ill-posed…

Machine Learning · Computer Science 2019-05-16 Xingyu Xie , Jianlong Wu , Zhisheng Zhong , Guangcan Liu , Zhouchen Lin

An Adaptive Proximal ADMM for Nonconvex Linearly Constrained Composite Programs

This paper develops an adaptive proximal alternating direction method of multipliers (ADMM) for solving linearly constrained, composite optimization problems under the assumption that the smooth component of the objective is weakly convex,…

Optimization and Control · Mathematics 2026-05-04 Leandro Farias Maia , David H. Gutman , Renato D. C. Monteiro , Gilson N. Silva

Discrete Adjoint Matching

Computation methods for solving entropy-regularized reward optimization -- a class of problems widely used for fine-tuning generative models -- have advanced rapidly. Among those, Adjoint Matching (AM, Domingo-Enrich et al., 2025) has…

Machine Learning · Statistics 2026-02-17 Oswin So , Brian Karrer , Chuchu Fan , Ricky T. Q. Chen , Guan-Horng Liu

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

The Adaptive Momentum Estimation (Adam) algorithm is highly effective in training various deep learning tasks. Despite this, there's limited theoretical understanding for Adam, especially when focusing on its vanilla form in non-convex…

Optimization and Control · Mathematics 2025-02-25 Yusu Hong , Junhong Lin

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks (GANs). However, most of the recent efforts for solving them are limited to special…

Optimization and Control · Mathematics 2021-08-10 Babak Barazandeh , Tianjian Huang , George Michailidis

Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization

Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability -- requiring no a priori knowledge about problem-specific parameters nor tuning of learning rates. However, when…

Optimization and Control · Mathematics 2022-10-17 Junchi Yang , Xiang Li , Niao He

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

In deep learning, different kinds of deep networks typically need different optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve the model…

Machine Learning · Computer Science 2024-12-02 Xingyu Xie , Pan Zhou , Huan Li , Zhouchen Lin , Shuicheng Yan

DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in…

Machine Learning · Computer Science 2019-05-30 Parvin Nazari , Davoud Ataee Tarzanagh , George Michailidis

Adaptive Consensus ADMM for Distributed Optimization

The alternating direction method of multipliers (ADMM) is commonly used for distributed model fitting problems, but its performance and reliability depend strongly on user-defined penalty parameters. We study distributed ADMM methods that…

Machine Learning · Computer Science 2017-06-21 Zheng Xu , Gavin Taylor , Hao Li , Mario Figueiredo , Xiaoming Yuan , Tom Goldstein

Convergence of Adam Under Relaxed Assumptions

In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. Despite the popularity and efficiency of the Adam algorithm in training deep neural…

Optimization and Control · Mathematics 2023-11-08 Haochuan Li , Alexander Rakhlin , Ali Jadbabaie

CAdam: Confidence-Based Optimization for Online Learning

Modern recommendation systems frequently employ online learning to dynamically update their models with freshly collected data. The most commonly used optimizer for updating neural networks in these contexts is the Adam optimizer, which…

Machine Learning · Computer Science 2025-06-05 Shaowen Wang , Anan Liu , Jian Xiao , Huan Liu , Yuekui Yang , Cong Xu , Qianqian Pu , Suncong Zheng , Wei Zhang , Di Wang , Jie Jiang , Jian Li

Convergence in On-line Learning of Static and Dynamic Systems

The paper derives analytical expressions for the asymptotic average updating direction of the adaptive moment generation (ADAM) algorithm when applied to recursive identification of nonlinear systems. It is proved that the standard…

Systems and Control · Electrical Eng. & Systems 2025-10-24 Torbjörn Wigren , Ruoqi Zhang , Per Mattsson

Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization

Stochastic compositional optimization generalizes classic (non-compositional) stochastic optimization to the minimization of compositions of functions. Each composition may introduce an additional expectation. The series of expectations may…

Optimization and Control · Mathematics 2021-09-29 Tianyi Chen , Yuejiao Sun , Wotao Yin

Adam: A Method for Stochastic Optimization

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has…

Machine Learning · Computer Science 2017-01-31 Diederik P. Kingma , Jimmy Ba

Alignment Adapter to Improve the Performance of Compressed Deep Learning Models

Compressed Deep Learning (DL) models are essential for deployment in resource-constrained environments. But their performance often lags behind their large-scale counterparts. To bridge this gap, we propose Alignment Adapter (AlAd): a…

Machine Learning · Computer Science 2026-02-17 Rohit Raj Rai , Abhishek Dhaka , Amit Awekar

DEAM: Adaptive Momentum with Discriminative Weight for Stochastic Optimization

Optimization algorithms with momentum, e.g., (ADAM), have been widely used for building deep learning models due to the faster convergence rates compared with stochastic gradient descent (SGD). Momentum helps accelerate SGD in the relevant…

Machine Learning · Computer Science 2020-01-24 Jiyang Bai , Yuxiang Ren , Jiawei Zhang