English
Related papers

Related papers: Dynamic Memory Based Adaptive Optimization

200 papers

Training large language models (LLMs) relies on adaptive optimizers such as Adam, which introduce extra operations and require significantly more memory to maintain first- and second-order moments than SGD. While recent works such as…

Machine Learning · Computer Science 2026-05-22 Athanasios Glentis , Jiaxiang Li , Andi Han , Mingyi Hong

The memory challenges associated with training Large Language Models (LLMs) have become a critical concern, particularly when using the Adam optimizer. To address this issue, numerous memory-efficient techniques have been proposed, with…

Machine Learning · Computer Science 2025-02-12 Yiming Chen , Yuan Zhang , Yin Liu , Kun Yuan , Zaiwen Wen

Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter…

Machine Learning · Computer Science 2021-06-22 Paul-Aymeric McRae , Prasanna Parthasarathi , Mahmoud Assran , Sarath Chandar

A framework is introduced for solving a sequence of slowly changing optimization problems, including those arising in regression and classification applications, using optimization algorithms such as stochastic gradient descent (SGD). The…

Machine Learning · Computer Science 2015-09-25 Craig Wilson , Venugopal V. Veeravalli

Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric…

Machine Learning · Computer Science 2022-07-19 Luke Metz , C. Daniel Freeman , James Harrison , Niru Maheswaranathan , Jascha Sohl-Dickstein

The vast majority of modern deep learning models are trained with momentum-based first-order optimizers. The momentum term governs the optimizer's memory by determining how much each past gradient contributes to the current convergence…

Machine Learning · Computer Science 2026-05-12 Kristi Topollai , Anna Choromanska

NLP research has explored different neural model architectures and sizes, datasets, training objectives, and transfer learning techniques. However, the choice of optimizer during training has not been explored as extensively. Typically,…

Computation and Language · Computer Science 2024-02-13 Nefeli Gkouti , Prodromos Malakasiotis , Stavros Toumpis , Ion Androutsopoulos

Adam is the go-to optimizer for training modern machine learning models, but it requires additional memory to maintain the moving averages of the gradients and their squares. While various low-memory optimizers have been proposed that…

Machine Learning · Computer Science 2025-03-19 Dayal Singh Kalra , John Kirchenbauer , Maissam Barkeshli , Tom Goldstein

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common…

Machine Learning · Computer Science 2023-11-16 Kavosh Asadi , Rasool Fakoor , Shoham Sabach

Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include…

Machine Learning · Computer Science 2024-07-02 Elif Ceren Gok Yildirim , Murat Onur Yildirim , Mert Kilickaya , Joaquin Vanschoren

Training large language models requires optimization algorithms that are not only statistically effective, but also computationally and memory efficient at extreme scale. Although Adam remains the dominant optimizer for large-scale…

Machine Learning · Computer Science 2026-05-12 Aditya Ranganath

Reinforcement learning (RL), particularly RL from verifiable reward (RLVR), has become a crucial phase of training large language models (LLMs) and a key focus of current scaling efforts. However, optimization practices in RL largely follow…

Machine Learning · Computer Science 2026-02-25 Sagnik Mukherjee , Lifan Yuan , Pavan Jayasinha , Dilek Hakkani-Tür , Hao Peng

Designing efficient optimizers for large language models (LLMs) with low-memory requirements and fast convergence is an important and challenging problem. This paper makes a step towards the systematic design of such optimizers through the…

Machine Learning · Computer Science 2025-02-21 Wenbo Gong , Meyer Scetbon , Chao Ma , Edward Meeds

A framework previously introduced in [3] for solving a sequence of stochastic optimization problems with bounded changes in the minimizers is extended and applied to machine learning problems such as regression and classification. The…

Machine Learning · Computer Science 2019-04-08 Craig Wilson , Yuheng Bu , Venugopal Veeravalli

We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy…

Machine Learning · Computer Science 2025-03-04 Thomas Robert , Mher Safaryan , Ionut-Vlad Modoranu , Dan Alistarh

Recently many first and second order variants of SGD have been proposed to facilitate training of Deep Neural Networks (DNNs). A common limitation of these works stem from the fact that they use the same learning rate across all instances…

Machine Learning · Computer Science 2021-05-31 Shreyas Saxena , Nidhi Vyas , Dennis DeCoste

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate…

Machine Learning · Computer Science 2019-12-04 Michael R. Zhang , James Lucas , Geoffrey Hinton , Jimmy Ba

First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks. However, the choice of the optimizer has become an ad-hoc rule that can significantly affect the performance.…

Machine Learning · Computer Science 2020-10-21 Samy Jelassi , Aaron Defazio

With the rapid development of natural language processing technology, large-scale language models (LLM) have achieved remarkable results in a variety of tasks. However, how to effectively train these huge models and improve their…

Artificial Intelligence · Computer Science 2024-12-09 Jiajing Chen , Bingying Liu , Xiaoxuan Liao , Jia Gao , Hongye Zheng , Yue Li

In modern optimization methods used in deep learning, each update depends on the history of previous iterations, often referred to as memory, and this dependence decays fast as the iterates go further into the past. For example, gradient…

Machine Learning · Computer Science 2026-01-14 Matias D. Cattaneo , Boris Shigida
‹ Prev 1 2 3 10 Next ›