English
Related papers

Related papers: BAdam: A Memory Efficient Full Parameter Optimizat…

200 papers

In the training of large language models (LLMs), updating parameters more efficiently and stably has always been an important challenge. To achieve efficient parameter updates, existing methods usually achieve performance comparable to full…

Machine Learning · Computer Science 2025-02-06 Da Chang , Yu Li , Ganzhao Yuan

We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy…

Machine Learning · Computer Science 2025-03-04 Thomas Robert , Mher Safaryan , Ionut-Vlad Modoranu , Dan Alistarh

Training large language models typically demands extensive GPU memory and substantial financial investment, which poses a barrier for many small- to medium-sized teams. In this paper, we propose a full-parameter pre-training and fine-tuning…

Machine Learning · Computer Science 2025-09-29 Zeyu Liu , Yan Li , Yunquan Zhang , Boyang Zhang , Guoyong Jiang , Xin Zhang , Limin Xiao , Weifeng Zhang , Daning Cheng

Large language models (LLMs) have demonstrated remarkable performance due to their large parameter counts and extensive training data. However, their scale leads to significant memory bottlenecks during training, especially when using…

Machine Learning · Computer Science 2026-05-14 Ziqing Wen , Jiahuan Wang , Ping Luo , Dongsheng Li , Tao Sun

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting…

Computation and Language · Computer Science 2024-06-07 Kai Lv , Yuqing Yang , Tengxiao Liu , Qinghui Gao , Qipeng Guo , Xipeng Qiu

In the training of large language models, momentum is widely used and often demonstrated to achieve significant acceleration. However, storing momentum typically presents memory challenges. In this paper, we propose AdaPM, an adaptive…

Machine Learning · Computer Science 2025-10-13 Yimu Zhang , Yuanshi Liu , Cong Fang

Low-rank adaptation (LoRA) has become the default approach to fine-tune large language models (LLMs) due to its significant reduction in trainable parameters. However, trainable parameter demand for LoRA increases with increasing model…

Computation and Language · Computer Science 2024-06-19 Seyedarmin Azizi , Souvik Kundu , Massoud Pedram

With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the…

Machine Learning · Computer Science 2025-08-15 Philip Zmushko , Aleksandr Beznosikov , Martin Takáč , Samuel Horváth

Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However,…

Artificial Intelligence · Computer Science 2024-04-18 Taeho Kim , Yanming Wang , Vatshank Chaturvedi , Lokesh Gupta , Seyeon Kim , Yongin Kwon , Sangtae Ha

Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents…

Machine Learning · Computer Science 2024-12-17 Amrutha Varshini Ramesh , Vignesh Ganapathiraman , Issam H. Laradji , Mark Schmidt

Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…

Machine Learning · Computer Science 2025-02-03 Ouya Wang , Shenglong Zhou , Geoffrey Ye Li

Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated remarkable reasoning abilities but require significant computational resources for fine-tuning. This paper presents a resource-efficient fine-tuning approach for…

Computation and Language · Computer Science 2025-10-07 Imran Mansha

This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text…

Computation and Language · Computer Science 2023-12-19 Bingchen Zhao , Haoqin Tu , Chen Wei , Jieru Mei , Cihang Xie

For various optimization methods, gradient descent-based algorithms can achieve outstanding performance and have been widely used in various tasks. Among those commonly used algorithms, ADAM owns many advantages such as fast convergence…

Neural and Evolutionary Computing · Computer Science 2021-05-05 Jiyang Bai , Yuxiang Ren , Jiawei Zhang

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that…

Machine Learning · Computer Science 2024-10-15 James Liu , Guangxuan Xiao , Kai Li , Jason D. Lee , Song Han , Tri Dao , Tianle Cai

The memory challenges associated with training Large Language Models (LLMs) have become a critical concern, particularly when using the Adam optimizer. To address this issue, numerous memory-efficient techniques have been proposed, with…

Machine Learning · Computer Science 2025-02-12 Yiming Chen , Yuan Zhang , Yin Liu , Kun Yuan , Zaiwen Wen

In large language model (LLM) training, several parallelization strategies, including Tensor Parallelism (TP), Pipeline Parallelism (PP), Data Parallelism (DP), as well as Sequence Parallelism (SP) and Context Parallelism (CP), are employed…

Machine Learning · Computer Science 2024-11-12 Kazuki Fujii , Kohei Watanabe , Rio Yokota

Evaluating Large Language Models (LLMs) in open-ended scenarios is challenging because existing benchmarks and metrics can not measure them comprehensively. To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM)…

Computation and Language · Computer Science 2025-03-04 Lianghui Zhu , Xinggang Wang , Xinlong Wang

Large Language Models (LLMs) have significantly advanced natural language processing with exceptional task generalization capabilities. Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution, freezing the original model…

Machine Learning · Computer Science 2025-03-18 Jun Zhang , Jue Wang , Huan Li , Lidan Shou , Ke Chen , Yang You , Guiming Xie , Xuejian Gong , Kunlong Zhou

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory…

‹ Prev 1 2 3 10 Next ›