Related papers: How Memory in Optimization Algorithms Implicitly M…

Memory Augmented Optimizers for Deep Learning

Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter…

Machine Learning · Computer Science 2021-06-22 Paul-Aymeric McRae , Prasanna Parthasarathi , Mahmoud Assran , Sarath Chandar

Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization

The vast majority of modern deep learning models are trained with momentum-based first-order optimizers. The momentum term governs the optimizer's memory by determining how much each past gradient contributes to the current convergence…

Machine Learning · Computer Science 2026-05-12 Kristi Topollai , Anna Choromanska

Resetting the Optimizer in Deep RL: An Empirical Study

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common…

Machine Learning · Computer Science 2023-11-16 Kavosh Asadi , Rasool Fakoor , Shoham Sabach

Narrowing the Focus: Learned Optimizers for Pretrained Models

In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of…

Machine Learning · Computer Science 2024-10-08 Gus Kristiansen , Mark Sandler , Andrey Zhmoginov , Nolan Miller , Anirudh Goyal , Jihwan Lee , Max Vladymyrov

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods…

Machine Learning · Computer Science 2018-10-08 Peter Henderson , Joshua Romoff , Joelle Pineau

Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules

Deep learning optimizers are optimization algorithms that enable deep neural networks to learn. The effectiveness of learning is highly dependent on the optimizer employed in the training process. Alongside the rapid advancement of deep…

Machine Learning · Computer Science 2025-09-24 Doğay Altınel

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Symbolic Discovery of Optimization Algorithms

We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program…

Machine Learning · Computer Science 2023-05-10 Xiangning Chen , Chen Liang , Da Huang , Esteban Real , Kaiyuan Wang , Yao Liu , Hieu Pham , Xuanyi Dong , Thang Luong , Cho-Jui Hsieh , Yifeng Lu , Quoc V. Le

Layerwise Optimization by Gradient Decomposition for Continual Learning

Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. However, when learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Shixiang Tang , Dapeng Chen , Jinguo Zhu , Shijie Yu , Wanli Ouyang

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

Machine Learning · Computer Science 2026-04-15 Tong Zhang , Jiangning Zhang , Zhucun Xue , Juntao Jiang , Yicheng Xu , Chengming Xu , Teng Hu , Xingyu Xie , Xiaobin Hu , Yabiao Wang , Yong Liu , Shuicheng Yan

Learning Gradient Descent: Better Generalization and Longer Horizons

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Memory-Reduced Meta-Learning with Guaranteed Convergence

The optimization-based meta-learning approach is gaining increased traction because of its unique ability to quickly adapt to a new task using only small amounts of data. However, existing optimization-based meta-learning approaches, such…

Machine Learning · Computer Science 2024-12-17 Honglin Yang , Ji Ma , Xiao Yu

From inexact optimization to learning via gradient concentration

Optimization in machine learning typically deals with the minimization of empirical objectives defined by training data. However, the ultimate goal of learning is to minimize the error on future data (test error), for which the training…

Machine Learning · Statistics 2021-11-08 Bernhard Stankewitz , Nicole Mücke , Lorenzo Rosasco

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is…

Machine Learning · Computer Science 2018-06-12 Sanjeev Arora , Nadav Cohen , Elad Hazan

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize well to unseen data. Recently, researchers explained it by investigating the implicit regularization effect…

Machine Learning · Computer Science 2021-12-17 Bohan Wang , Qi Meng , Wei Chen , Tie-Yan Liu

Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps

Modern deep-learning training is not memoryless. Updates depend on optimizer moments and averaging, data-order policies (random reshuffling vs with-replacement, staged augmentations and replay), the nonconvex path, and auxiliary state…

Machine Learning · Computer Science 2026-01-30 Vasileios Sevetlidis , George Pavlidis

Investigation into the Training Dynamics of Learned Optimizers

Optimization is an integral part of modern deep learning. Recently, the concept of learned optimizers has emerged as a way to accelerate this optimization process by replacing traditional, hand-crafted algorithms with meta-learned…

Machine Learning · Computer Science 2023-12-13 Jan Sobotka , Petr Šimánek , Daniel Vašata

Does Momentum Change the Implicit Regularization on Separable Data?

The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies…

Machine Learning · Computer Science 2022-05-30 Bohan Wang , Qi Meng , Huishuai Zhang , Ruoyu Sun , Wei Chen , Zhi-Ming Ma , Tie-Yan Liu

Implicit Regularization in Deep Matrix Factorization

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity." We study the implicit…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Wei Hu , Yuping Luo

Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks

Works on implicit regularization have studied gradient trajectories during the optimization process to explain why deep networks favor certain kinds of solutions over others. In deep linear networks, it has been shown that gradient descent…

Machine Learning · Computer Science 2023-06-02 Dan Zhao