English
Related papers

Related papers: Memory Augmented Optimizers for Deep Learning

200 papers

The optimization-based meta-learning approach is gaining increased traction because of its unique ability to quickly adapt to a new task using only small amounts of data. However, existing optimization-based meta-learning approaches, such…

Machine Learning · Computer Science 2024-12-17 Honglin Yang , Ji Ma , Xiao Yu

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of…

Machine Learning · Computer Science 2024-10-08 Gus Kristiansen , Mark Sandler , Andrey Zhmoginov , Nolan Miller , Anirudh Goyal , Jihwan Lee , Max Vladymyrov

In modern optimization methods used in deep learning, each update depends on the history of previous iterations, often referred to as memory, and this dependence decays fast as the iterates go further into the past. For example, gradient…

Machine Learning · Computer Science 2026-01-14 Matias D. Cattaneo , Boris Shigida

Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is…

Machine Learning · Computer Science 2021-12-30 Omer Elkabetz , Nadav Cohen

Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, gradient methods are widely used in…

Machine Learning · Statistics 2017-08-16 Alexander Jung

Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. However, when learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Shixiang Tang , Dapeng Chen , Jinguo Zhu , Shijie Yu , Wanli Ouyang

In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam)…

Machine Learning · Computer Science 2024-09-19 Abel C. H. Chen

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We…

We propose a first-order method for convex optimization, where instead of being restricted to the gradient from a single parameter, gradients from multiple parameters can be used during each step of gradient descent. This setup is…

Machine Learning · Computer Science 2023-02-08 Yash Chandak , Shiv Shankar , Venkata Gandikota , Philip S. Thomas , Arya Mazumdar

Optimisers are an essential component for training machine learning models, and their design influences learning speed and generalisation. Several studies have attempted to learn more effective gradient-descent optimisers via solving a…

Machine Learning · Computer Science 2022-03-08 Boyan Gao , Henry Gouk , Hae Beom Lee , Timothy M. Hospedales

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

In this paper, we consider gradient methods for minimizing smooth convex functions, which employ the information obtained at the previous iterations in order to accelerate the convergence towards the optimal solution. This information is…

Optimization and Control · Mathematics 2021-06-02 Yurii Nesterov , Mihai I. Florea

An associative memory (AM) enables cue-response recall, and associative memorization has recently been noted to underlie the operation of modern neural architectures such as Transformers. This work addresses a distributed setting where…

Machine Learning · Computer Science 2026-04-24 Bowen Wang , Matteo Zecchin , Osvaldo Simeone

Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods…

Machine Learning · Computer Science 2018-10-08 Peter Henderson , Joshua Romoff , Joelle Pineau

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

The promise of learning to learn for robotics rests on the hope that by extracting some information about the learning process itself we can speed up subsequent similar learning tasks. Here, we introduce a computationally efficient online…

Machine Learning · Computer Science 2018-03-28 Franziska Meier , Daniel Kappler , Stefan Schaal

Gradient-based optimization drives the unprecedented performance of modern deep neural network models across diverse applications. Adaptive algorithms have accelerated neural network training due to their rapid convergence rates; however,…

Machine Learning · Computer Science 2025-05-06 Chia-Wei Hsu , Nien-Ti Tsou , Yu-Cheng Chen , Yang Jeong Park , Ju Li

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar
‹ Prev 1 2 3 10 Next ›