Related papers: MINI-LLM: Memory-Efficient Structured Pruning for …

SlimLLM: Accurate Structured Pruning for Large Language Models

Large language models(LLMs) have garnered significant attention and demonstrated impressive capabilities in a wide range of applications. However, due to their enormous computational costs, the deployment and application of LLMs are often…

Machine Learning · Computer Science 2025-05-30 Jialong Guo , Xinghao Chen , Yehui Tang , Yunhe Wang

Pruning Large Language Models by Identifying and Preserving Functional Networks

Structured pruning is one of the representative techniques for compressing large language models (LLMs) to reduce GPU memory consumption and accelerate inference speed. It offers significant practical value in improving the efficiency of…

Computation and Language · Computer Science 2025-08-08 Yiheng Liu , Junhao Ning , Sichen Xia , Xiaohui Gao , Ning Qiang , Bao Ge , Junwei Han , Xintao Hu

DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with…

Computation and Language · Computer Science 2024-11-05 Shangqian Gao , Chi-Heng Lin , Ting Hua , Tang Zheng , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

Towards Efficient Automatic Self-Pruning of Large Language Models

Despite exceptional capabilities, Large Language Models (LLMs) still face deployment challenges due to their enormous size. Post-training structured pruning is a promising solution that prunes LLMs without the need for retraining, reducing…

Machine Learning · Computer Science 2025-02-21 Weizhong Huang , Yuxin Zhang , Xiawu Zheng , Fei Chao , Rongrong Ji

Frustratingly Easy Task-aware Pruning for Large Language Models

Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often…

Computation and Language · Computer Science 2025-10-28 Yuanhe Tian , Junjie Liu , Xican Yang , Haishan Ye , Yan Song

LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Large Language Models (LLMs), such as LLaMA and T5, have shown exceptional performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) has emerged to cheaply fine-tune these LLMs on downstream tasks, their…

Machine Learning · Computer Science 2024-08-08 Mingyang Zhang , Hao Chen , Chunhua Shen , Zhen Yang , Linlin Ou , Xinyi Yu , Bohan Zhuang

Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Large Language Models (LLMs) with billions of parameters are prime targets for network pruning, removing some model weights without hurting performance. Prior approaches such as magnitude pruning, SparseGPT, and Wanda, either concentrated…

Computation and Language · Computer Science 2024-04-10 Rocktim Jyoti Das , Mingjie Sun , Liqun Ma , Zhiqiang Shen

Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods

Structured pruning of modern large language models (LLMs) has emerged as a way of decreasing their high computational needs. Width pruning reduces the size of projection weight matrices (e.g., by removing attention heads) while maintaining…

Machine Learning · Computer Science 2024-06-25 Bo-Kyeong Kim , Geonmin Kim , Tae-Ho Kim , Thibault Castells , Shinkook Choi , Junho Shin , Hyoung-Kyu Song

Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration

Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial…

Computation and Language · Computer Science 2026-01-07 Guangxin Wu , Hao Zhang , Zhang Zhibin , Jiafeng Guo , Xueqi Cheng

From LLMs to LRMs: Rethinking Pruning for Reasoning-Centric Models

Large language models (LLMs) are increasingly costly to deploy, motivating extensive research on model pruning. However, most existing studies focus on instruction-following LLMs, leaving it unclear whether established pruning strategies…

Machine Learning · Computer Science 2026-01-27 Longwei Ding , Anhao Zhao , Fanghua Ye , Ziyang Chen , Xiaoyu Shen

LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation

Large language models (LLMs) have demonstrated remarkable performance across various language tasks, but their widespread deployment is impeded by their large size and high computational costs. Structural pruning is a prevailing technique…

Computation and Language · Computer Science 2024-12-10 Haihang Wu

SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models

In spite of strong performance achieved by LLMs, the costs of their deployment are unaffordable. For the compression of LLMs, gradient-based pruning methods present promising effectiveness. However, in these methods, the gradient…

Computation and Language · Computer Science 2025-06-16 Hourun Zhu , Chengchao Shen

Efficient LLMs with AMP: Attention Heads and MLP Pruning

Deep learning drives a new wave in computing systems and triggers the automation of increasingly complex problems. In particular, Large Language Models (LLMs) have significantly advanced cognitive tasks, often matching or even surpassing…

Machine Learning · Computer Science 2025-05-01 Leandro Giusti Mugnaini , Bruno Lopes Yamamoto , Lucas Lauton de Alcantara , Victor Zacarias , Edson Bollis , Lucas Pellicer , Anna Helena Reali Costa , Artur Jordao

The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models

Although large language models (LLMs) have achieved revolutionary breakthroughs in many fields, their large model size and high computational cost pose significant challenges for practical deployment on resource-constrained edge devices. To…

Machine Learning · Computer Science 2025-10-29 Yao Lu , Yuqi Li , Wenbin Xie , Shanqing Yu , Qi Xuan , Zhaowei Zhu , Shiping Wen

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged moderate-sized large language models (LLMs) highlights the potential of building smaller yet powerful LLMs. Regardless, the cost of training such models from…

Computation and Language · Computer Science 2024-04-12 Mengzhou Xia , Tianyu Gao , Zhiyuan Zeng , Danqi Chen

Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient

Recent Large-Language Models (LLMs) pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria often rely on heuristically hand-crafted metrics, potentially leading…

Machine Learning · Computer Science 2025-07-04 Yuan Gao , Zujing Liu , Weizhong Zhang , Bo Du , Gui-Song Xia

GradPruner: Gradient-Guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs

Fine-tuning Large Language Models (LLMs) with downstream data is often considered time-consuming and expensive. Structured pruning methods are primarily employed to improve the inference efficiency of pre-trained models. Meanwhile, they…

Computation and Language · Computer Science 2026-01-28 Wei Huang , Anda Cheng , Yinggui Wang

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily…

Computation and Language · Computer Science 2025-07-29 Yiran Huang , Lukas Thede , Massimiliano Mancini , Wenjia Xu , Zeynep Akata

Large Language Models Are Overparameterized Text Encoders

Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training. However, their large size balloons inference time and memory requirements. In this paper, we show that…

Computation and Language · Computer Science 2024-10-21 Thennal D K , Tim Fischer , Chris Biemann

SlimGPT: Layer-wise Structured Pruning for Large Language Models

Large language models (LLMs) have garnered significant attention for their remarkable capabilities across various domains, whose vast parameter scales present challenges for practical deployment. Structured pruning is an effective method to…

Artificial Intelligence · Computer Science 2024-12-25 Gui Ling , Ziyang Wang , Yuliang Yan , Qingwen Liu