Related papers: Two-Stage Regularization-Based Structured Pruning …

2SSP: A Two-Stage Framework for Structured Pruning of LLMs

We propose a novel Two-Stage framework for Structured Pruning (\textsc{2SSP}) for pruning Large Language Models (LLMs), which combines two different strategies of pruning, namely Width and Depth Pruning. The first stage (Width Pruning)…

Computation and Language · Computer Science 2025-08-19 Fabrizio Sandri , Elia Cunegatti , Giovanni Iacca

DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. Recent studies have revealed that LLMs exhibit sparsity, providing the…

Machine Learning · Computer Science 2025-07-01 Mingkuan Feng , Jinyang Wu , Shuai Zhang , Pengpeng Shao , Ruihan Jin , Zhengqi Wen , Jianhua Tao , Feihu Che

DLP: Dynamic Layerwise Pruning in Large Language Models

Pruning has recently been widely adopted to reduce the parameter scale and improve the inference efficiency of Large Language Models (LLMs). Mainstream pruning techniques often rely on uniform layerwise pruning strategies, which can lead to…

Computation and Language · Computer Science 2025-06-04 Yuli Chen , Bo Cheng , Jiale Han , Yingying Zhang , Yingting Li , Shuhao Zhang

Pruning Large Language Models by Identifying and Preserving Functional Networks

Structured pruning is one of the representative techniques for compressing large language models (LLMs) to reduce GPU memory consumption and accelerate inference speed. It offers significant practical value in improving the efficiency of…

Computation and Language · Computer Science 2025-08-08 Yiheng Liu , Junhao Ning , Sichen Xia , Xiaohui Gao , Ning Qiang , Bao Ge , Junwei Han , Xintao Hu

Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling

Large Language Models (LLMs) now exhibit remarkable reasoning capabilities through test-time compute scaling (TTS), with impressive performance across math and coding benchmarks. In parallel, research in model compression has developed…

Artificial Intelligence · Computer Science 2026-05-29 Ocean Monjur , Shahriar Kabir Nahin , Anshuman Chhabra

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily…

Computation and Language · Computer Science 2025-07-29 Yiran Huang , Lukas Thede , Massimiliano Mancini , Wenjia Xu , Zeynep Akata

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged moderate-sized large language models (LLMs) highlights the potential of building smaller yet powerful LLMs. Regardless, the cost of training such models from…

Computation and Language · Computer Science 2024-04-12 Mengzhou Xia , Tianyu Gao , Zhiyuan Zeng , Danqi Chen

Learned Threshold Pruning

This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they…

Machine Learning · Computer Science 2021-03-22 Kambiz Azarian , Yash Bhalgat , Jinwon Lee , Tijmen Blankevoort

DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with…

Computation and Language · Computer Science 2024-11-05 Shangqian Gao , Chi-Heng Lin , Ting Hua , Tang Zheng , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

DarwinLM: Evolutionary Structured Pruning of Large Language Models

Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective…

Machine Learning · Computer Science 2025-03-06 Shengkun Tang , Oliver Sieberling , Eldar Kurtic , Zhiqiang Shen , Dan Alistarh

Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration

Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial…

Computation and Language · Computer Science 2026-01-07 Guangxin Wu , Hao Zhang , Zhang Zhibin , Jiafeng Guo , Xueqi Cheng

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train…

Machine Learning · Computer Science 2025-11-17 Rui Pan , Shivanshu Shekhar , Boyao Wang , Shizhe Diao , Jipeng Zhang , Xingyuan Pan , Renjie Pi , Tong Zhang

Towards Efficient Automatic Self-Pruning of Large Language Models

Despite exceptional capabilities, Large Language Models (LLMs) still face deployment challenges due to their enormous size. Post-training structured pruning is a promising solution that prunes LLMs without the need for retraining, reducing…

Machine Learning · Computer Science 2025-02-21 Weizhong Huang , Yuxin Zhang , Xiawu Zheng , Fei Chao , Rongrong Ji

Streamlining Redundant Layers to Compress Large Language Models

This paper introduces LLM-Streamline, a pioneer work on layer pruning for large language models (LLMs). It is based on the observation that different layers have varying impacts on hidden states, enabling the identification of less…

Computation and Language · Computer Science 2025-01-28 Xiaodong Chen , Yuxuan Hu , Jing Zhang , Yanling Wang , Cuiping Li , Hong Chen

FASP: Fast and Accurate Structured Pruning of Large Language Models

The rapid increase in the size of large language models (LLMs) has significantly escalated their computational and memory demands, posing challenges for efficient deployment, especially on resource-constrained devices. Structured pruning…

Machine Learning · Computer Science 2025-01-17 Hanyu Hu , Pengxiang Zhao , Ping Li , Yi Zheng , Zhefeng Wang , Xiaoming Yuan

Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency

While Large Vision Language Models (LVLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose deployment challenges on resource-constrained edge devices. Current parameter reduction techniques…

Computation and Language · Computer Science 2026-04-28 Yiran Huang , Lukas Thede , Massimiliano Mancini , Wenjia Xu , Zeynep Akata

MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

The remarkable performance of large language models (LLMs) in various language tasks has attracted considerable attention. However, the ever-increasing size of these models presents growing challenges for deployment and inference.…

Computation and Language · Computer Science 2025-02-21 Jiayu Qin , Jianchao Tan , Kefeng Zhang , Xunliang Cai , Wei Wang

CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information

The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently…

Computation and Language · Computer Science 2024-12-11 Yuxin Wang , Minghua Ma , Zekun Wang , Jingchang Chen , Huiming Fan , Liping Shan , Qing Yang , Dongliang Xu , Ming Liu , Bing Qin

Pruning Foundation Models for High Accuracy without Retraining

Despite the superior performance, it is challenging to deploy foundation models or large language models (LLMs) due to their massive parameters and computations. While pruning is a promising technique to reduce model size and accelerate the…

Machine Learning · Computer Science 2024-10-22 Pu Zhao , Fei Sun , Xuan Shen , Pinrui Yu , Zhenglun Kong , Yanzhi Wang , Xue Lin

LEAP: Learnable Pruning for Transformer-based Models

Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. However, current pruning algorithms either only focus on one pruning category, e.g., structured…

Computation and Language · Computer Science 2022-05-24 Zhewei Yao , Xiaoxia Wu , Linjian Ma , Sheng Shen , Kurt Keutzer , Michael W. Mahoney , Yuxiong He