Related papers: Deterministic Differentiable Structured Pruning fo…

DLP: Dynamic Layerwise Pruning in Large Language Models

Pruning has recently been widely adopted to reduce the parameter scale and improve the inference efficiency of Large Language Models (LLMs). Mainstream pruning techniques often rely on uniform layerwise pruning strategies, which can lead to…

Computation and Language · Computer Science 2025-06-04 Yuli Chen , Bo Cheng , Jiale Han , Yingying Zhang , Yingting Li , Shuhao Zhang

DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with…

Computation and Language · Computer Science 2024-11-05 Shangqian Gao , Chi-Heng Lin , Ting Hua , Tang Zheng , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization

Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs. Structured pruning reduces model size and speeds up inference but often causes uneven degradation across…

Computation and Language · Computer Science 2025-05-28 Hexuan Deng , Wenxiang Jiao , Xuebo Liu , Jing Li , Min Zhang , Zhaopeng Tu

MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

The remarkable performance of large language models (LLMs) in various language tasks has attracted considerable attention. However, the ever-increasing size of these models presents growing challenges for deployment and inference.…

Computation and Language · Computer Science 2025-02-21 Jiayu Qin , Jianchao Tan , Kefeng Zhang , Xunliang Cai , Wei Wang

DarwinLM: Evolutionary Structured Pruning of Large Language Models

Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective…

Machine Learning · Computer Science 2025-03-06 Shengkun Tang , Oliver Sieberling , Eldar Kurtic , Zhiqiang Shen , Dan Alistarh

Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration

Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial…

Computation and Language · Computer Science 2026-01-07 Guangxin Wu , Hao Zhang , Zhang Zhibin , Jiafeng Guo , Xueqi Cheng

PDP: Parameter-free Differentiable Pruning is All You Need

DNN pruning is a popular way to reduce the size of a model, improve the inference latency, and minimize the power consumption on DNN accelerators. However, existing approaches might be too complex, expensive or ineffective to apply to a…

Machine Learning · Computer Science 2023-11-21 Minsik Cho , Saurabh Adya , Devang Naik

Structural Pruning for Diffusion Models

Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs). The impressive capability of these models, however, often entails…

Machine Learning · Computer Science 2023-10-03 Gongfan Fang , Xinyin Ma , Xinchao Wang

Saliency-driven Dynamic Token Pruning for Large Language Models

Despite the recent success of large language models (LLMs), LLMs are particularly challenging in long-sequence inference scenarios due to the quadratic computational complexity of the attention mechanism. Inspired by the interpretability…

Computation and Language · Computer Science 2025-04-10 Yao Tao , Yehui Tang , Yun Wang , Mingjian Zhu , Hailin Hu , Yunhe Wang

Instruction-Following Pruning for Large Language Models

With the rapid scaling of large language models (LLMs), structured pruning has become a widely used technique to learn efficient, smaller models from larger ones, delivering superior performance compared to training similarly sized models…

Computation and Language · Computer Science 2025-06-04 Bairu Hou , Qibin Chen , Jianyu Wang , Guoli Yin , Chong Wang , Nan Du , Ruoming Pang , Shiyu Chang , Tao Lei

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing

We introduce Probe Pruning (PP), a novel framework for online, dynamic, structured pruning of Large Language Models (LLMs) applied in a batch-wise manner. PP leverages the insight that not all samples and tokens contribute equally to the…

Computation and Language · Computer Science 2025-02-24 Qi Le , Enmao Diao , Ziyan Wang , Xinran Wang , Jie Ding , Li Yang , Ali Anwar

LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling

Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Zhihan Zhang , Xiang Pan , Hongchen Wei , Zhenzhong Chen

Dynamic Probabilistic Pruning: A general framework for hardware-constrained pruning at different granularities

Unstructured neural network pruning algorithms have achieved impressive compression rates. However, the resulting - typically irregular - sparse matrices hamper efficient hardware implementations, leading to additional memory usage and…

Machine Learning · Computer Science 2021-05-27 Lizeth Gonzalez-Carabarin , Iris A. M. Huijben , Bastiaan S. Veeling , Alexandre Schmid , Ruud J. G. van Sloun

Prompt-based Depth Pruning of Large Language Models

Depth pruning aims to reduce the inference cost of a large language model without any hardware-specific complications, by simply removing several less important transformer blocks. However, our empirical findings suggest that the importance…

Computation and Language · Computer Science 2025-06-13 Juyun Wee , Minjae Park , Jaeho Lee

MDP: Multidimensional Vision Model Pruning with Latency Constraint

Current structural pruning methods face two significant limitations: (i) they often limit pruning to finer-grained levels like channels, making aggressive parameter reduction challenging, and (ii) they focus heavily on parameter and FLOP…

Computer Vision and Pattern Recognition · Computer Science 2025-04-04 Xinglong Sun , Barath Lakshmanan , Maying Shen , Shiyi Lan , Jingde Chen , Jose M. Alvarez

Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient

Recent Large-Language Models (LLMs) pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria often rely on heuristically hand-crafted metrics, potentially leading…

Machine Learning · Computer Science 2025-07-04 Yuan Gao , Zujing Liu , Weizhong Zhang , Bo Du , Gui-Song Xia

DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity

Deep Learning (DL) has achieved unprecedented success in various application domains. Meanwhile, model pruning has emerged as a viable solution to reduce the footprint of DL models in mobile applications, without compromising their…

Hardware Architecture · Computer Science 2024-01-17 Christodoulos Peltekis , Vasileios Titopoulos , Chrysostomos Nicopoulos , Giorgos Dimitrakopoulos

Adaptive Pruning for Large Language Models with Structural Importance Awareness

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high…

Computation and Language · Computer Science 2024-12-20 Haotian Zheng , Jinke Ren , Yushan Sun , Ruichen Zhang , Wenbo Zhang , Zhen Li , Dusit Niyato , Shuguang Cui , Yatong Han

Sample-aware Adaptive Structured Pruning for Large Language Models

Large language models (LLMs) have achieved outstanding performance in natural language processing, but enormous model sizes and high computational costs limit their practical deployment. Structured pruning can effectively reduce the…

Computation and Language · Computer Science 2025-03-11 Jun Kong , Xinge Ma , Jin Wang , Xuejie Zhang

Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs

Large language models (LLMs) have demonstrated impressive capabilities, but their enormous size poses significant challenges for deployment in real-world applications. To address this issue, researchers have sought to apply network pruning…

Machine Learning · Computer Science 2025-07-28 Chang Gao , Kang Zhao , Runqi Wang , Jianfei Chen , Liping Jing