Related papers: DeepCuts: Single-Shot Interpretability based Pruni…

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark

Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent ``Sparsity May Cry'' (SMC) benchmark put into question the…

Computation and Language · Computer Science 2023-12-22 Eldar Kurtic , Torsten Hoefler , Dan Alistarh

On Importance of Layer Pruning for Smaller BERT Models and Low Resource Languages

This study explores the effectiveness of layer pruning for developing more efficient BERT models tailored to specific downstream tasks in low-resource languages. Our primary objective is to evaluate whether pruned BERT models can maintain…

Computation and Language · Computer Science 2025-01-03 Mayur Shirke , Amey Shembade , Madhushri Wagh , Pavan Thorat , Raviraj Joshi

Dissecting Pruned Neural Networks

Pruning is a standard technique for removing unnecessary structure from a neural network to reduce its storage footprint, computational demands, or energy consumption. Pruning can reduce the parameter-counts of many state-of-the-art neural…

Machine Learning · Computer Science 2019-07-02 Jonathan Frankle , David Bau

FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To…

Machine Learning · Computer Science 2024-10-22 Yang Zhang , Yawei Li , Xinpeng Wang , Qianli Shen , Barbara Plank , Bernd Bischl , Mina Rezaei , Kenji Kawaguchi

FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing

The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We…

Computation and Language · Computer Science 2025-02-03 James Seale Smith , Chi-Heng Lin , Shikhar Tuli , Haris Jeelani , Shangqian Gao , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

Large language models (LLMs) are expensive to serve because model parameters, attention computation, and KV caches impose substantial memory and latency costs. We present GRASPrune, a structured pruning framework applied after pretraining…

Artificial Intelligence · Computer Science 2026-04-22 Ziyang Wang , Jiangfeng Xiao , Chuan Xiao , Ruoxiang Li , Rui Mao , Jianbin Qin

STAT: Shrinking Transformers After Training

We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next…

Machine Learning · Computer Science 2024-06-04 Megan Flynn , Alexander Wang , Dean Edward Alvarez , Christopher De Sa , Anil Damle

Component-Aware Pruning Framework for Neural Network Controllers via Gradient-Based Importance Estimation

The transition from monolithic to multi-component neural architectures in advanced neural network controllers poses substantial challenges due to the high computational complexity of the latter. Conventional model compression techniques for…

Machine Learning · Computer Science 2026-01-28 Ganesh Sundaram , Jonas Ulmen , Daniel Görges

PGB: One-Shot Pruning for BERT via Weight Grouping and Permutation

Large pretrained language models such as BERT suffer from slow inference and high memory usage, due to their huge size. Recent approaches to compressing BERT rely on iterative pruning and knowledge distillation, which, however, are often…

Computation and Language · Computer Science 2025-02-07 Hyemin Lim , Jaeyeon Lee , Dong-Wan Choi

Adapting by Pruning: A Case Study on BERT

Adapting pre-trained neural models to downstream tasks has become the standard practice for obtaining high-quality models. In this work, we propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in…

Machine Learning · Computer Science 2021-05-10 Yang Gao , Nicolo Colombo , Wei Wang

LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models

The evolving capabilities of large language models are accompanied by growing sizes and deployment costs, necessitating effective inference optimisation techniques. We propose a novel pruning method utilising centrality measures from graph…

Machine Learning · Computer Science 2024-12-02 David Hoffmann , Kailash Budhathoki , Matthaeus Kleindessner

Towards Building Efficient Sentence BERT Models using Layer Pruning

This study examines the effectiveness of layer pruning in creating efficient Sentence BERT (SBERT) models. Our goal is to create smaller sentence embedding models that reduce complexity while maintaining strong embedding similarity. We…

Computation and Language · Computer Science 2024-09-24 Anushka Shelke , Riya Savant , Raviraj Joshi

Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Pre-trained universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. While effective,…

Computation and Language · Computer Science 2020-05-18 Mitchell A. Gordon , Kevin Duh , Nicholas Andrews

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Transformer-based language models have become a key building block for natural language processing. While these models are extremely accurate, they can be too large and computationally intensive to run on standard deployments. A variety of…

Computation and Language · Computer Science 2022-10-19 Eldar Kurtic , Daniel Campos , Tuan Nguyen , Elias Frantar , Mark Kurtz , Benjamin Fineran , Michael Goin , Dan Alistarh

Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining

To remove redundant components of large language models (LLMs) without incurring significant computational costs, this work focuses on single-shot pruning without a retraining phase. We simplify the pruning process for Transformer-based…

Artificial Intelligence · Computer Science 2024-07-30 Jianwei Li , Yijun Dong , Qi Lei

Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning

The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the…

Machine Learning · Computer Science 2021-03-15 Seul-Ki Yeom , Philipp Seegerer , Sebastian Lapuschkin , Alexander Binder , Simon Wiedemann , Klaus-Robert Müller , Wojciech Samek

One-Shot Pruning for Fast-adapting Pre-trained Models on Devices

Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Haiyan Zhao , Guodong Long

Numerical Pruning for Efficient Autoregressive Models

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high…

Machine Learning · Computer Science 2024-12-18 Xuan Shen , Zhao Song , Yufa Zhou , Bo Chen , Jing Liu , Ruiyi Zhang , Ryan A. Rossi , Hao Tan , Tong Yu , Xiang Chen , Yufan Zhou , Tong Sun , Pu Zhao , Yanzhi Wang , Jiuxiang Gu

Neural Network Pruning by Gradient Descent

The rapid increase in the parameters of deep learning models has led to significant costs, challenging computational efficiency and model interpretability. In this paper, we introduce a novel and straightforward neural network pruning…

Machine Learning · Computer Science 2023-11-23 Zhang Zhang , Ruyi Tao , Jiang Zhang

Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm

Transformer-based pre-trained language models have significantly improved the performance of various natural language processing (NLP) tasks in the recent years. While effective and prevalent, these models are usually prohibitively large…

Computation and Language · Computer Science 2022-01-19 Dongkuan Xu , Ian E. H. Yen , Jinxi Zhao , Zhibin Xiao