Related papers: ProTrain: Efficient LLM Training via Memory-Aware …

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank…

Machine Learning · Computer Science 2024-11-05 Andi Han , Jiaxiang Li , Wei Huang , Mingyi Hong , Akiko Takeda , Pratik Jawanpuria , Bamdev Mishra

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by…

Machine Learning · Computer Science 2025-05-30 Athanasios Glentis , Jiaxiang Li , Qiulin Shang , Andi Han , Ioannis Tsaknakis , Quan Wei , Mingyi Hong

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training

As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner. Existing LLM training plans typically employ…

Machine Learning · Computer Science 2024-09-11 Jehyeon Bang , Yujeong Choi , Myeongwoo Kim , Yongdeok Kim , Minsoo Rhu

Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Large Language Models (LLMs) have significantly advanced natural language processing with exceptional task generalization capabilities. Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution, freezing the original model…

Machine Learning · Computer Science 2025-03-18 Jun Zhang , Jue Wang , Huan Li , Lidan Shou , Ke Chen , Yang You , Guiming Xie , Xuejian Gong , Kunlong Zhou

PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration

State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights. Training these models is very compute- and memory-resource intensive. Much research has been done on pruning or…

Machine Learning · Computer Science 2019-12-10 Sangkug Lym , Esha Choukse , Siavash Zangeneh , Wei Wen , Sujay Sanghavi , Mattan Erez

LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Large Language Models (LLMs), such as LLaMA and T5, have shown exceptional performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) has emerged to cheaply fine-tune these LLMs on downstream tasks, their…

Machine Learning · Computer Science 2024-08-08 Mingyang Zhang , Hao Chen , Chunhua Shen , Zhen Yang , Linlin Ou , Xinyi Yu , Bohan Zhuang

MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models

As Large Language Models (LLMs) grow dramatically in size, there is an increasing trend in compressing and speeding up these models. Previous studies have highlighted the usefulness of gradients for importance scoring in neural network…

Computation and Language · Computer Science 2024-07-17 Hongrong Cheng , Miao Zhang , Javen Qinfeng Shi

Towards Efficient Automatic Self-Pruning of Large Language Models

Despite exceptional capabilities, Large Language Models (LLMs) still face deployment challenges due to their enormous size. Post-training structured pruning is a promising solution that prunes LLMs without the need for retraining, reducing…

Machine Learning · Computer Science 2025-02-21 Weizhong Huang , Yuxin Zhang , Xiawu Zheng , Fei Chao , Rongrong Ji

LLM-Inspired Pretrain-Then-Finetune for Small-Data, Large-Scale Optimization

We consider small-data, large-scale decision problems in which a firm must make many operational decisions simultaneously (e.g., across a large product portfolio) while observing only a few, potentially noisy, data points per instance.…

Machine Learning · Computer Science 2026-02-04 Zishi Zhang , Jinhui Han , Ming Hu , Yijie Peng

NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models

The considerable size of Large Language Models (LLMs) presents notable deployment challenges, particularly on resource-constrained hardware. Structured pruning, offers an effective means to compress LLMs, thereby reducing storage costs and…

Computation and Language · Computer Science 2024-06-28 Shengrui Li , Junzhe Chen , Xueting Han , Jing Bai

Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations

The development of large-scale foundation models, particularly Large Language Models (LLMs), is constrained by significant computational and memory bottlenecks. These challenges elevate throughput optimization from a mere engineering task…

Machine Learning · Computer Science 2026-03-31 Mayank Jha

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve…

Computation and Language · Computer Science 2024-06-05 Bowen Zhao , Hannaneh Hajishirzi , Qingqing Cao

PRewrite: Prompt Rewriting with Reinforcement Learning

Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a "trial and error" fashion that can be time consuming, ineffective, and sub-optimal. Even for the prompts which seemingly…

Artificial Intelligence · Computer Science 2024-06-11 Weize Kong , Spurthi Amba Hombaiah , Mingyang Zhang , Qiaozhu Mei , Michael Bendersky

Large Product Key Memory for Pretrained Language Models

Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their empirical application is only limited to…

Computation and Language · Computer Science 2020-10-09 Gyuwan Kim , Tae-Hwan Jung

Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation

With the development of transformer-based large language models (LLMs), they have been applied to many fields due to their remarkable utility, but this comes at a considerable computational cost at deployment. Fortunately, some methods such…

Machine Learning · Computer Science 2024-08-13 Harry Dong , Beidi Chen , Yuejie Chi

PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning

Recent advances in fine-tuning large language models (LLMs) have greatly enhanced their usage in domain-specific tasks. Despite the success, fine-tuning continues to rely on repeated and lengthy prompts, which escalate computational…

Computation and Language · Computer Science 2024-10-17 Jiaru Zou , Mengyu Zhou , Tao Li , Shi Han , Dongmei Zhang

PromptTuner: SLO-Aware Elastic System for LLM Prompt Tuning

Prompt tuning has become a prominent strategy for enhancing the performance of Large Language Models (LLMs) on downstream tasks. Many IT enterprises now offer Prompt-Tuning-as-a-Service to fulfill the growing demand for prompt tuning LLMs…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-06 Wei Gao , Peng Sun , Dmitrii Ustiugov , Tianwei Zhang , Yonggang Wen

Reversing Large Language Models for Efficient Training and Fine-Tuning

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In…

Computation and Language · Computer Science 2025-12-05 Eshed Gal , Moshe Eliasof , Javier Turek , Uri Ascher , Eran Treister , Eldad Haber

PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs

Neural Networks can be effectively compressed through pruning, significantly reducing storage and compute demands while maintaining predictive performance. Simple yet effective methods like magnitude pruning remove less important parameters…

Machine Learning · Computer Science 2025-12-03 Max Zimmer , Megi Andoni , Christoph Spiegel , Sebastian Pokutta

IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining

Recent advancements in large language models have intensified the need for efficient and deployable models within limited inference budgets. Structured pruning pipelines have shown promise in token efficiency compared to training…

Computation and Language · Computer Science 2025-03-11 Yixiao Li , Xianzhi Du , Ajay Jaiswal , Tao Lei , Tuo Zhao , Chong Wang , Jianyu Wang