Related papers: BAdam: A Memory Efficient Full Parameter Optimizat…

AlphaAdam:Asynchronous Masked Optimization with Dynamic Alpha for Selective Updates

In the training of large language models (LLMs), updating parameters more efficiently and stably has always been an important challenge. To achieve efficient parameter updates, existing methods usually achieve performance comparable to full…

Machine Learning · Computer Science 2025-02-06 Da Chang , Yu Li , Ganzhao Yuan

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy…

Machine Learning · Computer Science 2025-03-04 Thomas Robert , Mher Safaryan , Ionut-Vlad Modoranu , Dan Alistarh

Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training

Training large language models typically demands extensive GPU memory and substantial financial investment, which poses a barrier for many small- to medium-sized teams. In this paper, we propose a full-parameter pre-training and fine-tuning…

Machine Learning · Computer Science 2025-09-29 Zeyu Liu , Yan Li , Yunquan Zhang , Boyang Zhang , Guoyong Jiang , Xin Zhang , Limin Xiao , Weifeng Zhang , Daning Cheng

FOAM: Blocked State Folding for Memory-Efficient LLM Training

Large language models (LLMs) have demonstrated remarkable performance due to their large parameter counts and extensive training data. However, their scale leads to significant memory bottlenecks during training, especially when using…

Machine Learning · Computer Science 2026-05-14 Ziqing Wen , Jiahuan Wang , Ping Luo , Dongsheng Li , Tao Sun

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting…

Computation and Language · Computer Science 2024-06-07 Kai Lv , Yuqing Yang , Tengxiao Liu , Qinghui Gao , Qipeng Guo , Xipeng Qiu

AdaPM: a Partial Momentum Algorithm for LLM Training

In the training of large language models, momentum is widely used and often demonstrated to achieve significant acceleration. However, storing momentum typically presents memory challenges. In this paper, we propose AdaPM, an adaptive…

Machine Learning · Computer Science 2025-10-13 Yimu Zhang , Yuanshi Liu , Cong Fang

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

Low-rank adaptation (LoRA) has become the default approach to fine-tune large language models (LLMs) due to its significant reduction in trainable parameters. However, trainable parameter demand for LoRA increases with increasing model…

Computation and Language · Computer Science 2024-06-19 Seyedarmin Azizi , Souvik Kundu , Massoud Pedram

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the…

Machine Learning · Computer Science 2025-08-15 Philip Zmushko , Aleksandr Beznosikov , Martin Takáč , Samuel Horváth

LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However,…

Artificial Intelligence · Computer Science 2024-04-18 Taeho Kim , Yanming Wang , Vatshank Chaturvedi , Lokesh Gupta , Seyeon Kim , Yongin Kwon , Sangtae Ha

BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents…

Machine Learning · Computer Science 2024-12-17 Amrutha Varshini Ramesh , Vignesh Ganapathiraman , Issam H. Laradji , Mark Schmidt

BADM: Batch ADMM for Deep Learning

Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…

Machine Learning · Computer Science 2025-02-03 Ouya Wang , Shenglong Zhou , Geoffrey Ye Li

Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning

Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated remarkable reasoning abilities but require significant computational resources for fine-tuning. This paper presents a resource-efficient fine-tuning approach for…

Computation and Language · Computer Science 2025-10-07 Imran Mansha

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text…

Computation and Language · Computer Science 2023-12-19 Bingchen Zhao , Haoqin Tu , Chen Wei , Jieru Mei , Cihang Xie

BGADAM: Boosting based Genetic-Evolutionary ADAM for Neural Network Optimization

For various optimization methods, gradient descent-based algorithms can achieve outstanding performance and have been widely used in various tasks. Among those commonly used algorithms, ADAM owns many advantages such as fast convergence…

Neural and Evolutionary Computing · Computer Science 2021-05-05 Jiyang Bai , Yuxiang Ren , Jiawei Zhang

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that…

Machine Learning · Computer Science 2024-10-15 James Liu , Guangxuan Xiao , Kai Li , Jason D. Lee , Song Han , Tri Dao , Tianle Cai

A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models

The memory challenges associated with training Large Language Models (LLMs) have become a critical concern, particularly when using the Adam optimizer. To address this issue, numerous memory-efficient techniques have been proposed, with…

Machine Learning · Computer Science 2025-02-12 Yiming Chen , Yuan Zhang , Yin Liu , Kun Yuan , Zaiwen Wen

Accelerating Large Language Model Training with 4D Parallelism and Memory Consumption Estimator

In large language model (LLM) training, several parallelization strategies, including Tensor Parallelism (TP), Pipeline Parallelism (PP), Data Parallelism (DP), as well as Sequence Parallelism (SP) and Context Parallelism (CP), are employed…

Machine Learning · Computer Science 2024-11-12 Kazuki Fujii , Kohei Watanabe , Rio Yokota

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Evaluating Large Language Models (LLMs) in open-ended scenarios is challenging because existing benchmarks and metrics can not measure them comprehensively. To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM)…

Computation and Language · Computer Science 2025-03-04 Lianghui Zhu , Xinggang Wang , Xinlong Wang

Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Large Language Models (LLMs) have significantly advanced natural language processing with exceptional task generalization capabilities. Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution, freezing the original model…

Machine Learning · Computer Science 2025-03-18 Jun Zhang , Jue Wang , Huan Li , Lidan Shou , Ke Chen , Yang You , Guiming Xie , Xuejian Gong , Kunlong Zhou

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory…

Machine Learning · Computer Science 2024-05-29 Yihua Zhang , Pingzhi Li , Junyuan Hong , Jiaxiang Li , Yimeng Zhang , Wenqing Zheng , Pin-Yu Chen , Jason D. Lee , Wotao Yin , Mingyi Hong , Zhangyang Wang , Sijia Liu , Tianlong Chen