Related papers: Fluctuation-based Adaptive Structured Pruning for …

FASP: Fast and Accurate Structured Pruning of Large Language Models

The rapid increase in the size of large language models (LLMs) has significantly escalated their computational and memory demands, posing challenges for efficient deployment, especially on resource-constrained devices. Structured pruning…

Machine Learning · Computer Science 2025-01-17 Hanyu Hu , Pengxiang Zhao , Ping Li , Yi Zheng , Zhefeng Wang , Xiaoming Yuan

Adaptive Pruning for Large Language Models with Structural Importance Awareness

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high…

Computation and Language · Computer Science 2024-12-20 Haotian Zheng , Jinke Ren , Yushan Sun , Ruichen Zhang , Wenbo Zhang , Zhen Li , Dusit Niyato , Shuguang Cui , Yatong Han

SlimLLM: Accurate Structured Pruning for Large Language Models

Large language models(LLMs) have garnered significant attention and demonstrated impressive capabilities in a wide range of applications. However, due to their enormous computational costs, the deployment and application of LLMs are often…

Machine Learning · Computer Science 2025-05-30 Jialong Guo , Xinghao Chen , Yehui Tang , Yunhe Wang

Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration

Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial…

Computation and Language · Computer Science 2026-01-07 Guangxin Wu , Hao Zhang , Zhang Zhibin , Jiafeng Guo , Xueqi Cheng

LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Pucheng Zhai , Kailing Guo , Fang Liu , Xiaofen Xing , Xiangmin Xu

Pruning Large Language Models by Identifying and Preserving Functional Networks

Structured pruning is one of the representative techniques for compressing large language models (LLMs) to reduce GPU memory consumption and accelerate inference speed. It offers significant practical value in improving the efficiency of…

Computation and Language · Computer Science 2025-08-08 Yiheng Liu , Junhao Ning , Sichen Xia , Xiaohui Gao , Ning Qiang , Bao Ge , Junwei Han , Xintao Hu

Reconstruct the Pruned Model without Any Retraining

Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning…

Machine Learning · Computer Science 2024-07-19 Pingjie Wang , Ziqing Fan , Shengchao Hu , Zhe Chen , Yanfeng Wang , Yu Wang

DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with…

Computation and Language · Computer Science 2024-11-05 Shangqian Gao , Chi-Heng Lin , Ting Hua , Tang Zheng , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information

The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently…

Computation and Language · Computer Science 2024-12-11 Yuxin Wang , Minghua Ma , Zekun Wang , Jingchang Chen , Huiming Fan , Liping Shan , Qing Yang , Dongliang Xu , Ming Liu , Bing Qin

SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning

Instruction tuning has optimized the specialized capabilities of large language models (LLMs), but it often requires extensive datasets and prolonged training times. The challenge lies in developing specific capabilities by identifying…

Computation and Language · Computer Science 2026-05-26 Run Zou , Jianhang Ding , Yifan Ding , Wen Wu , Hao Chen , Renshu Gu

SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer…

Machine Learning · Computer Science 2025-05-07 Hanyu Hu , Xiaoming Yuan

LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation

Large language models (LLMs) have demonstrated remarkable performance across various language tasks, but their widespread deployment is impeded by their large size and high computational costs. Structural pruning is a prevailing technique…

Computation and Language · Computer Science 2024-12-10 Haihang Wu

Lightweight and Post-Training Structured Pruning for On-Device Large Lanaguage Models

Considering the hardware-friendly characteristics and broad applicability, structured pruning has emerged as an efficient solution to reduce the resource demands of large language models (LLMs) on resource-constrained devices. Traditional…

Machine Learning · Computer Science 2025-01-28 Zihuai Xu , Yang Xu , Hongli Xu , Yunming Liao , Zhiwei Yao , Zuan Xie

Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods

Structured pruning of modern large language models (LLMs) has emerged as a way of decreasing their high computational needs. Width pruning reduces the size of projection weight matrices (e.g., by removing attention heads) while maintaining…

Machine Learning · Computer Science 2024-06-25 Bo-Kyeong Kim , Geonmin Kim , Tae-Ho Kim , Thibault Castells , Shinkook Choi , Junho Shin , Hyoung-Kyu Song

Efficient LLMs with AMP: Attention Heads and MLP Pruning

Deep learning drives a new wave in computing systems and triggers the automation of increasingly complex problems. In particular, Large Language Models (LLMs) have significantly advanced cognitive tasks, often matching or even surpassing…

Machine Learning · Computer Science 2025-05-01 Leandro Giusti Mugnaini , Bruno Lopes Yamamoto , Lucas Lauton de Alcantara , Victor Zacarias , Edson Bollis , Lucas Pellicer , Anna Helena Reali Costa , Artur Jordao

LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Large Language Models (LLMs), such as LLaMA and T5, have shown exceptional performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) has emerged to cheaply fine-tune these LLMs on downstream tasks, their…

Machine Learning · Computer Science 2024-08-08 Mingyang Zhang , Hao Chen , Chunhua Shen , Zhen Yang , Linlin Ou , Xinyi Yu , Bohan Zhuang

NIRVANA: Structured pruning reimagined for large language models compression

Structured pruning of large language models (LLMs) offers substantial efficiency improvements by removing entire hidden units, yet current approaches often suffer from significant performance degradation, particularly in zero-shot settings,…

Machine Learning · Computer Science 2025-09-18 Mengting Ai , Tianxin Wei , Sirui Chen , Jingrui He

LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling

Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Zhihan Zhang , Xiang Pan , Hongchen Wei , Zhenzhong Chen

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train…

Machine Learning · Computer Science 2025-11-17 Rui Pan , Shivanshu Shekhar , Boyao Wang , Shizhe Diao , Jipeng Zhang , Xingyuan Pan , Renjie Pi , Tong Zhang

Frustratingly Easy Task-aware Pruning for Large Language Models

Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often…

Computation and Language · Computer Science 2025-10-28 Yuanhe Tian , Junjie Liu , Xican Yang , Haishan Ye , Yan Song