Related papers: A Fast Post-Training Pruning Framework for Transfo…

Pruning Foundation Models for High Accuracy without Retraining

Despite the superior performance, it is challenging to deploy foundation models or large language models (LLMs) due to their massive parameters and computations. While pruning is a promising technique to reduce model size and accelerate the…

Machine Learning · Computer Science 2024-10-22 Pu Zhao , Fei Sun , Xuan Shen , Pinrui Yu , Zhenglun Kong , Yanzhi Wang , Xue Lin

PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration

State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights. Training these models is very compute- and memory-resource intensive. Much research has been done on pruning or…

Machine Learning · Computer Science 2019-12-10 Sangkug Lym , Esha Choukse , Siavash Zangeneh , Wei Wen , Sujay Sanghavi , Mattan Erez

Block Pruning For Faster Transformers

Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models. Pruning methods have proven to be an effective way of reducing model size, whereas distillation…

Machine Learning · Computer Science 2021-09-13 François Lagunas , Ella Charlaix , Victor Sanh , Alexander M. Rush

STAT: Shrinking Transformers After Training

We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next…

Machine Learning · Computer Science 2024-06-04 Megan Flynn , Alexander Wang , Dean Edward Alvarez , Christopher De Sa , Anil Damle

Learnable Sparsity for Vision Generative Models

Diffusion models have achieved impressive advancements in various vision tasks. However, these gains often rely on increasing model size, which escalates computational complexity and memory demands, complicating deployment, raising…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Yang Zhang , Er Jin , Wenzhong Liang , Yanfei Dong , Ashkan Khakzar , Philip Torr , Johannes Stegmaier , Kenji Kawaguchi

Gradient-Free Structured Pruning with Unlabeled Data

Large Language Models (LLMs) have achieved great success in solving difficult tasks across many domains, but such success comes with a high computation cost, and inference latency. As developers and third parties customize these models, the…

Machine Learning · Computer Science 2023-07-18 Azade Nova , Hanjun Dai , Dale Schuurmans

Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining

To remove redundant components of large language models (LLMs) without incurring significant computational costs, this work focuses on single-shot pruning without a retraining phase. We simplify the pruning process for Transformer-based…

Artificial Intelligence · Computer Science 2024-07-30 Jianwei Li , Yijun Dong , Qi Lei

A One-step Pruning-recovery Framework for Acceleration of Convolutional Neural Networks

Acceleration of convolutional neural network has received increasing attention during the past several years. Among various acceleration techniques, filter pruning has its inherent merit by effectively reducing the number of convolution…

Computer Vision and Pattern Recognition · Computer Science 2019-06-19 Dong Wang , Lei Zhou , Xiao Bai , Jun Zhou

Weight Pruning via Adaptive Sparsity Loss

Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust compressive learning…

Machine Learning · Computer Science 2020-06-05 George Retsinas , Athena Elafrou , Georgios Goumas , Petros Maragos

SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale

The resource requirements of neural networks can be significantly reduced through pruning - the removal of seemingly less important parameters. However, for LLMs, full retraining to recover pruning-induced performance degradation is often…

Machine Learning · Computer Science 2026-02-03 Max Zimmer , Christophe Roux , Moritz Wagner , Deborah Hendrych , Sebastian Pokutta

AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning

As deep neural networks (DNNs) are increasingly deployed on edge devices, optimizing models for constrained computational resources is critical. Existing auto-pruning methods face challenges due to the diversity of DNN models, various…

Artificial Intelligence · Computer Science 2026-04-21 Lixian Jing , Jianpeng Qi , Junyu Dong , Yanwei Yu

MultiPruner: Balanced Structure Removal in Foundation Models

Recently, state-of-the-art approaches for pruning large pre-trained models (LPMs) have demonstrated that the training-free removal of non-critical residual blocks in Transformers is viable for reducing model size, achieving results that…

Machine Learning · Computer Science 2025-01-20 J. Pablo Muñoz , Jinjie Yuan , Nilesh Jain

Dynamic Model Pruning with Feedback

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression…

Machine Learning · Computer Science 2020-06-15 Tao Lin , Sebastian U. Stich , Luis Barba , Daniil Dmitriev , Martin Jaggi

Post-training deep neural network pruning via layer-wise calibration

We present a post-training weight pruning method for deep neural networks that achieves accuracy levels tolerable for the production setting and that is sufficiently fast to be run on commodity hardware such as desktop CPUs or edge devices.…

Computer Vision and Pattern Recognition · Computer Science 2021-05-03 Ivan Lazarevich , Alexander Kozlov , Nikita Malinin

Numerical Pruning for Efficient Autoregressive Models

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high…

Machine Learning · Computer Science 2024-12-18 Xuan Shen , Zhao Song , Yufa Zhou , Bo Chen , Jing Liu , Ruiyi Zhang , Ryan A. Rossi , Hao Tan , Tong Yu , Xiang Chen , Yufan Zhou , Tong Sun , Pu Zhao , Yanzhi Wang , Jiuxiang Gu

Single Shot Structured Pruning Before Training

We introduce a method to speed up training by 2x and inference by 3x in deep neural networks using structured pruning applied before training. Unlike previous works on pruning before training which prune individual weights, our work…

Machine Learning · Computer Science 2020-07-02 Joost van Amersfoort , Milad Alizadeh , Sebastian Farquhar , Nicholas Lane , Yarin Gal

One-Shot Pruning for Fast-adapting Pre-trained Models on Devices

Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Haiyan Zhao , Guodong Long

Prune Once for All: Sparse Pre-Trained Language Models

Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase…

Computation and Language · Computer Science 2021-11-11 Ofir Zafrir , Ariel Larey , Guy Boudoukh , Haihao Shen , Moshe Wasserblat

PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs

Neural Networks can be effectively compressed through pruning, significantly reducing storage and compute demands while maintaining predictive performance. Simple yet effective methods like magnitude pruning remove less important parameters…

Machine Learning · Computer Science 2025-12-03 Max Zimmer , Megi Andoni , Christoph Spiegel , Sebastian Pokutta

Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads

Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of natural language processing (NLP) tasks. By learning rich language knowledge with millions of parameters, these models are usually…

Computation and Language · Computer Science 2020-11-10 Zhengyan Zhang , Fanchao Qi , Zhiyuan Liu , Qun Liu , Maosong Sun