Related papers: Eigenpruning: an Interpretability-Inspired PEFT Me…

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Recent work on pruning large language models (LLMs) has shown that one can eliminate a large number of parameters without compromising performance, making pruning a promising strategy to reduce LLM model size. Existing LLM pruning…

Machine Learning · Computer Science 2024-10-16 Haiquan Lu , Yefan Zhou , Shiwei Liu , Zhangyang Wang , Michael W. Mahoney , Yaoqing Yang

Weight Spectra Induced Efficient Model Adaptation

Large-scale foundation models have demonstrated remarkable versatility across a wide range of downstream tasks. However, fully fine-tuning these models incurs prohibitive computational costs, motivating the development of…

Machine Learning · Computer Science 2025-05-30 Chongjie Si , Xuankun Yang , Muqing Liu , Yadao Wang , Xiaokang Yang , Wenbo Su , Bo Zheng , Wei Shen

PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs

Parameter Efficient Fine-Tuning (PEFT) methods have emerged as effective and promising approaches for fine-tuning pre-trained language models. Compared with Full parameter Fine-Tuning (FFT), PEFT achieved comparable task performance with a…

Machine Learning · Computer Science 2025-06-10 Tongzhou Yu , Zhuhao Zhang , Guanghui Zhu , Shen Jiang , Meikang Qiu , Yihua Huang

The Unreasonable Ineffectiveness of the Deeper Layers

How is knowledge stored in an LLM's weights? We study this via layer pruning: if removing a certain layer does not affect model performance in common question-answering benchmarks, then the weights in that layer are not necessary for…

Computation and Language · Computer Science 2025-03-04 Andrey Gromov , Kushal Tirumala , Hassan Shapourian , Paolo Glorioso , Daniel A. Roberts

Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning

Parameter-efficient fine-tuning (PEFT) has emerged as the predominant technique for fine-tuning in the era of large language models. However, existing PEFT methods still have inadequate training efficiency. Firstly, the utilization of…

Computation and Language · Computer Science 2024-06-07 Naibin Gu , Peng Fu , Xiyu Liu , Bowen Shen , Zheng Lin , Weiping Wang

Elimination-compensation pruning for fully-connected neural networks

The unmatched ability of Deep Neural Networks in capturing complex patterns in large and noisy datasets is often associated with their large hypothesis space, and consequently to the vast amount of parameters that characterize model…

Machine Learning · Computer Science 2026-02-25 Enrico Ballini , Luca Muscarnera , Alessio Fumagalli , Anna Scotti , Francesco Regazzoni

LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models

The evolving capabilities of large language models are accompanied by growing sizes and deployment costs, necessitating effective inference optimisation techniques. We propose a novel pruning method utilising centrality measures from graph…

Machine Learning · Computer Science 2024-12-02 David Hoffmann , Kailash Budhathoki , Matthaeus Kleindessner

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model…

Machine Learning · Computer Science 2025-02-27 Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song , Yufa Zhou

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to…

Computation and Language · Computer Science 2023-05-23 Siyu Ren , Kenny Q. Zhu

Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Fine-tuning large language models (LLMs) on downstream tasks requires substantial computational resources. Selective PEFT, a class of parameter-efficient fine-tuning (PEFT) methodologies, aims to mitigate these computational challenges by…

Computation and Language · Computer Science 2025-06-24 Aradhye Agarwal , Suhas K Ramesh , Ayan Sengupta , Tanmoy Chakraborty

AutoLR: Layer-wise Pruning and Auto-tuning of Learning Rates in Fine-tuning of Deep Networks

Existing fine-tuning methods use a single learning rate over all layers. In this paper, first, we discuss that trends of layer-wise weight variations by fine-tuning using a single learning rate do not match the well-known notion that…

Computer Vision and Pattern Recognition · Computer Science 2021-01-05 Youngmin Ro , Jin Young Choi

Frustratingly Easy Task-aware Pruning for Large Language Models

Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often…

Computation and Language · Computer Science 2025-10-28 Yuanhe Tian , Junjie Liu , Xican Yang , Haishan Ye , Yan Song

Learned Threshold Pruning

This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they…

Machine Learning · Computer Science 2021-03-22 Kambiz Azarian , Yash Bhalgat , Jinwon Lee , Tijmen Blankevoort

Adapting by Pruning: A Case Study on BERT

Adapting pre-trained neural models to downstream tasks has become the standard practice for obtaining high-quality models. In this work, we propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in…

Machine Learning · Computer Science 2021-05-10 Yang Gao , Nicolo Colombo , Wei Wang

Propulsion: Steering LLM with Tiny Fine-Tuning

The rapid advancements in Large Language Models (LLMs) have revolutionized natural language processing (NLP) and related fields. However, fine-tuning these models for specific tasks remains computationally expensive and risks degrading…

Computation and Language · Computer Science 2024-12-17 Md Kowsher , Nusrat Jahan Prottasha , Prakash Bhat

Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability

The exponential growth of large language models (LLMs) like ChatGPT has revolutionized artificial intelligence, offering unprecedented capabilities in natural language processing. However, the extensive computational resources required for…

Computation and Language · Computer Science 2025-02-25 Ashhadul Islam , Samir Brahim Belhaouari , Amine Bermak

Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning

The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the…

Machine Learning · Computer Science 2021-03-15 Seul-Ki Yeom , Philipp Seegerer , Sebastian Lapuschkin , Alexander Binder , Simon Wiedemann , Klaus-Robert Müller , Wojciech Samek

Linearization Explains Fine-Tuning in Large Language Models

Parameter-Efficient Fine-Tuning (PEFT) is a popular class of techniques that strive to adapt large models in a scalable and resource-efficient manner. Yet, the mechanisms underlying their training performance and generalization remain…

Machine Learning · Computer Science 2026-02-10 Zahra Rahimi Afzal , Tara Esmaeilbeig , Mojtaba Soltanalian , Mesrob I. Ohannessian

Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining

To remove redundant components of large language models (LLMs) without incurring significant computational costs, this work focuses on single-shot pruning without a retraining phase. We simplify the pruning process for Transformer-based…

Artificial Intelligence · Computer Science 2024-07-30 Jianwei Li , Yijun Dong , Qi Lei

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Model pruning, i.e., removing a subset of model weights, has become a prominent approach to reducing the memory footprint of large language models (LLMs) during inference. Notably, popular inference engines, such as vLLM, enable users to…

Machine Learning · Computer Science 2026-04-07 Kazuki Egashira , Robin Staab , Thibaud Gloaguen , Mark Vero , Martin Vechev