Related papers: LEAP: Learnable Pruning for Transformer-based Mode…

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise…

Machine Learning · Computer Science 2026-05-19 Mohammad Mozaffari , Younes Hourri , Mohammad Rastegari , Mahyar Najibi

LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Pucheng Zhai , Kailing Guo , Fang Liu , Xiaofen Xing , Xiangmin Xu

LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling

Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Zhihan Zhang , Xiang Pan , Hongchen Wei , Zhenzhong Chen

Adaptive Pruning for Large Language Models with Structural Importance Awareness

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high…

Computation and Language · Computer Science 2024-12-20 Haotian Zheng , Jinke Ren , Yushan Sun , Ruichen Zhang , Wenbo Zhang , Zhen Li , Dusit Niyato , Shuguang Cui , Yatong Han

Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot…

Machine Learning · Computer Science 2023-03-16 Kaiqi Zhao , Animesh Jain , Ming Zhao

Learned Threshold Pruning

This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they…

Machine Learning · Computer Science 2021-03-22 Kambiz Azarian , Yash Bhalgat , Jinwon Lee , Tijmen Blankevoort

SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer…

Machine Learning · Computer Science 2025-05-07 Hanyu Hu , Xiaoming Yuan

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training. Learning rate schedules are widely used in real practice to adjust the learning rate according to pre-defined schedules…

Machine Learning · Computer Science 2022-08-26 Hengyu Liu , Qiang Fu , Lun Du , Tiancheng Zhang , Ge Yu , Shi Han , Dongmei Zhang

SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning

Instruction tuning has optimized the specialized capabilities of large language models (LLMs), but it often requires extensive datasets and prolonged training times. The challenge lies in developing specific capabilities by identifying…

Computation and Language · Computer Science 2026-05-26 Run Zou , Jianhang Ding , Yifan Ding , Wen Wu , Hao Chen , Renshu Gu

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

Large Language Models have achieved remarkable success across various natural language processing tasks, yet their high computational cost during inference remains a major bottleneck. This paper introduces Sparse Expert Activation Pruning…

Computation and Language · Computer Science 2025-03-11 Xun Liang , Hanyu Wang , Huayi Lai , Simin Niu , Shichao Song , Jiawei Yang , Jihao Zhao , Feiyu Xiong , Bo Tang , Zhiyu Li

Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning

As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of…

Machine Learning · Computer Science 2020-06-23 Minyoung Song , Jaehong Yoon , Eunho Yang , Sung Ju Hwang

RAP: Runtime Adaptive Pruning for LLM Inference

Large language models (LLMs) excel at language understanding and generation, but their enormous computational and memory requirements hinder deployment. Compression offers a potential solution to mitigate these constraints. However, most…

Machine Learning · Computer Science 2026-05-19 Huanrong Liu , Chunlin Tian , Xuyang Wei , Qingbiao Li , Li Li

DLP: Dynamic Layerwise Pruning in Large Language Models

Pruning has recently been widely adopted to reduce the parameter scale and improve the inference efficiency of Large Language Models (LLMs). Mainstream pruning techniques often rely on uniform layerwise pruning strategies, which can lead to…

Computation and Language · Computer Science 2025-06-04 Yuli Chen , Bo Cheng , Jiale Han , Yingying Zhang , Yingting Li , Shuhao Zhang

AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices. In this work, we focus on layer-wise post-training quantization and…

Machine Learning · Computer Science 2025-12-02 Jing Liu , Toshiaki Koike-Akino , Ye Wang , Hassan Mansour , Matthew Brand

Hessian-Aware Pruning and Optimal Neural Implant

Pruning is an effective method to reduce the memory footprint and FLOPs associated with neural network models. However, existing structured-pruning methods often result in significant accuracy degradation for moderate pruning levels. To…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Shixing Yu , Zhewei Yao , Amir Gholami , Zhen Dong , Sehoon Kim , Michael W Mahoney , Kurt Keutzer

Two-Stage Regularization-Based Structured Pruning for LLMs

The deployment of large language models (LLMs) is largely hindered by their large number of parameters. Structural pruning has emerged as a promising solution. Prior structured pruning methods directly remove unimportant parameters based on…

Machine Learning · Computer Science 2026-04-21 Mingkuan Feng , Jinyang Wu , Siyuan Liu , Shuai Zhang , Hongjian Fang , Ruihan Jin , Feihu Che , Pengpeng Shao , Zhengqi Wen , Jianhua Tao

Reweighted Proximal Pruning for Large-Scale Language Representation

Recently, pre-trained language representation flourishes as the mainstay of the natural language understanding community, e.g., BERT. These pre-trained language representations can create state-of-the-art results on a wide range of…

Machine Learning · Computer Science 2019-12-24 Fu-Ming Guo , Sijia Liu , Finlay S. Mungall , Xue Lin , Yanzhi Wang

Fluctuation-based Adaptive Structured Pruning for Large Language Models

Network Pruning is a promising way to address the huge computing resource demands of the deployment and inference of Large Language Models (LLMs). Retraining-free is important for LLMs' pruning methods. However, almost all of the existing…

Computation and Language · Computer Science 2023-12-20 Yongqi An , Xu Zhao , Tao Yu , Ming Tang , Jinqiao Wang

PIP: Perturbation-based Iterative Pruning for Large Language Models

The rapid increase in the parameter counts of Large Language Models (LLMs), which often reach into the billions or even trillions, presents significant challenges for their practical deployment, particularly in resource-constrained…

Machine Learning · Computer Science 2025-11-18 Yi Cao , Wei-Jie Xu , Yucheng Shen , Weijie Shi , Chi-Min Chan , Jianfeng Qu , Jiajie Xu

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve…

Computation and Language · Computer Science 2024-06-05 Bowen Zhao , Hannaneh Hajishirzi , Qingqing Cao