Related papers: Neural Language Model Pruning for Automatic Speech…

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim to reduce the model size and computation…

Computation and Language · Computer Science 2023-03-01 Yifan Peng , Kwangyoun Kim , Felix Wu , Prashant Sridhar , Shinji Watanabe

Can pruning make Large Language Models more efficient?

Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational…

Machine Learning · Computer Science 2023-10-10 Sia Gholami , Marwan Omar

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to…

Computation and Language · Computer Science 2023-05-23 Siyu Ren , Kenny Q. Zhu

Large Language Model Pruning

We surely enjoy the larger the better models for their superior performance in the last couple of years when both the hardware and software support the birth of such extremely huge models. The applied fields include text mining and others.…

Computation and Language · Computer Science 2024-06-04 Hanjuan Huang , Hao-Jia Song , Hsing-Kuo Pao

Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models

Transformer-based language models have shown state-of-the-art performance on a variety of natural language understanding tasks. To achieve this performance, these models are first pre-trained on general corpus and then fine-tuned on…

Computation and Language · Computer Science 2024-07-15 Mohammadreza Tayaranian , Seyyed Hasan Mozafari , Brett H. Meyer , James J. Clark , Warren J. Gross

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens…

Computation and Language · Computer Science 2024-06-03 Sotiris Anagnostidis , Dario Pavllo , Luca Biggio , Lorenzo Noci , Aurelien Lucchi , Thomas Hofmann

Structured Pruning of Large Language Models

Large language models have recently achieved state of the art performance across a wide variety of natural language tasks. Meanwhile, the size of these models and their latency have significantly increased, which makes their usage costly,…

Computation and Language · Computer Science 2021-03-30 Ziheng Wang , Jeremy Wohlwend , Tao Lei

Pruning Large Language Models via Accuracy Predictor

Large language models(LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. However, substantial model size poses challenges to training, inference, and deployment so…

Artificial Intelligence · Computer Science 2023-10-11 Yupeng Ji , Yibo Cao , Jiucai Liu

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications. To fully leverage the nearly unlimited corpora and capture…

Computation and Language · Computer Science 2018-09-11 Liyuan Liu , Xiang Ren , Jingbo Shang , Jian Peng , Jiawei Han

Revisiting Large Language Model Pruning using Neuron Semantic Attribution

Model pruning technique is vital for accelerating large language models by reducing their size and computational requirements. However, the generalizability of existing pruning methods across diverse datasets and tasks remains unclear.…

Computation and Language · Computer Science 2025-03-04 Yizhuo Ding , Xinwei Sun , Yanwei Fu , Guosheng Hu

Convexity-based Pruning of Speech Representation Models

Speech representation models based on the transformer architecture and trained by self-supervised learning have shown great promise for solving tasks such as speech and speaker recognition, keyword spotting, emotion detection, and more.…

Computation and Language · Computer Science 2024-11-25 Teresa Dorszewski , Lenka Tětková , Lars Kai Hansen

Adaptive Pruning of Deep Neural Networks for Resource-Aware Embedded Intrusion Detection on the Edge

Artificial neural network pruning is a method in which artificial neural network sizes can be reduced while attempting to preserve the predicting capabilities of the network. This is done to make the model smaller or faster during inference…

Machine Learning · Computer Science 2025-05-21 Alexandre Broggi , Nathaniel Bastian , Lance Fiondella , Gokhan Kul

Task-Agnostic Structured Pruning of Speech Representation Models

Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-08 Haoyu Wang , Siyuan Wang , Wei-Qiang Zhang , Hongbin Suo , Yulong Wan

Self-calibration for Language Model Quantization and Pruning

Quantization and pruning are fundamental approaches for model compression, enabling efficient inference for language models. In a post-training setting, state-of-the-art quantization and pruning methods require calibration data, a small set…

Computation and Language · Computer Science 2025-07-15 Miles Williams , George Chrysostomou , Nikolaos Aletras

Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing

Although large-scale self-supervised learning (SSL) models like WavLM have achieved state-of-the-art performance in speech processing, their significant size impedes deployment on resource-constrained devices. While structured pruning is a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-11 Junyi Peng , Lin Zhang , Jiangyu Han , Oldřich Plchot , Johan Rohdin , Themos Stafylakis , Shuai Wang , Jan Černocký

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production…

Machine Learning · Computer Science 2024-07-24 Aayush Saxena , Arit Kumar Bishwas , Ayush Ashok Mishra , Ryan Armstrong

Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study

The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Muzhou Yu , Linfeng Zhang , Kaisheng Ma

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train…

Machine Learning · Computer Science 2025-11-17 Rui Pan , Shivanshu Shekhar , Boyao Wang , Shizhe Diao , Jipeng Zhang , Xingyuan Pan , Renjie Pi , Tong Zhang

Pruning General Large Language Models into Customized Expert Models

Large language models (LLMs) have revolutionized natural language processing, yet their substantial model sizes often require substantial computational resources. To preserve computing resources and accelerate inference speed, it is crucial…

Computation and Language · Computer Science 2025-06-04 Yirao Zhao , Guizhen Chen , Kenji Kawaguchi , Lidong Bing , Wenxuan Zhang

High-Layer Attention Pruning with Rescaling

Pruning is a highly effective approach for compressing large language models (LLMs), significantly reducing inference latency. However, conventional training-free structured pruning methods often employ a heuristic metric that…

Computation and Language · Computer Science 2026-01-28 Songtao Liu , Peng Liu