Related papers: Task-oriented Memory-efficient Pruning-Adapter

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve…

Computation and Language · Computer Science 2024-06-05 Bowen Zhao , Hannaneh Hajishirzi , Qingqing Cao

Adapting by Pruning: A Case Study on BERT

Adapting pre-trained neural models to downstream tasks has become the standard practice for obtaining high-quality models. In this work, we propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in…

Machine Learning · Computer Science 2021-05-10 Yang Gao , Nicolo Colombo , Wei Wang

Can pruning make Large Language Models more efficient?

Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational…

Machine Learning · Computer Science 2023-10-10 Sia Gholami , Marwan Omar

Transfer Learning for Structured Pruning under Limited Task Data

Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by dropping structural units like layers and…

Computation and Language · Computer Science 2023-11-14 Lucio Dery , David Grangier , Awni Hannun

Neural Language Model Pruning for Automatic Speech Recognition

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models

The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity…

Computation and Language · Computer Science 2024-01-12 Jianwei Li , Qi Lei , Wei Cheng , Dongkuan Xu

IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining

Recent advancements in large language models have intensified the need for efficient and deployable models within limited inference budgets. Structured pruning pipelines have shown promise in token efficiency compared to training…

Computation and Language · Computer Science 2025-03-11 Yixiao Li , Xianzhi Du , Ajay Jaiswal , Tao Lei , Tuo Zhao , Chong Wang , Jianyu Wang

Movement Pruning: Adaptive Sparsity by Fine-Tuning

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing…

Computation and Language · Computer Science 2020-10-26 Victor Sanh , Thomas Wolf , Alexander M. Rush

Task-specific Compression for Multi-task Language Models using Attribution-based Pruning

Multi-task language models show outstanding performance for various natural language understanding tasks with only a single model. However, these language models utilize an unnecessarily large number of model parameters, even when used only…

Computation and Language · Computer Science 2023-02-14 Nakyeong Yang , Yunah Jang , Hwanhee Lee , Seohyeong Jung , Kyomin Jung

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train…

Machine Learning · Computer Science 2025-11-17 Rui Pan , Shivanshu Shekhar , Boyao Wang , Shizhe Diao , Jipeng Zhang , Xingyuan Pan , Renjie Pi , Tong Zhang

Parameter-Efficient Transfer Learning with Diff Pruning

While task-specific finetuning of pretrained networks has led to significant empirical advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task, memory-constrained settings. We propose diff pruning as a…

Computation and Language · Computer Science 2021-06-10 Demi Guo , Alexander M. Rush , Yoon Kim

Memory-based Parameter Adaptation

Deep neural networks have excelled on a wide range of problems, from vision to language and game playing. Neural networks very gradually incorporate information into weights as they process data, requiring very low learning rates. If the…

Machine Learning · Statistics 2018-03-01 Pablo Sprechmann , Siddhant M. Jayakumar , Jack W. Rae , Alexander Pritzel , Adrià Puigdomènech Badia , Benigno Uria , Oriol Vinyals , Demis Hassabis , Razvan Pascanu , Charles Blundell

Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning

Pre-trained large-scale language models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. However, the limited weight storage and computational speed on hardware platforms have impeded the…

Computation and Language · Computer Science 2020-11-18 Bingbing Li , Zhenglun Kong , Tianyun Zhang , Ji Li , Zhengang Li , Hang Liu , Caiwen Ding

Efficient Post-Training Pruning of Large Language Models with Statistical Correction

Post-training pruning is an effective approach for reducing the size and inference cost of large language models (LLMs), but existing methods often face a trade-off between pruning quality and computational efficiency. Heuristic pruning…

Computation and Language · Computer Science 2026-02-10 Peiqi Yu , Jinhao Wang , Xinyi Sui , Nam Ling , Wei Wang , Wei Jiang

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

Adapter-based tuning has recently arisen as an alternative to fine-tuning. It works by adding light-weight adapter modules to a pretrained language model (PrLM) and only updating the parameters of adapter modules when learning on a…

Computation and Language · Computer Science 2021-06-08 Ruidan He , Linlin Liu , Hai Ye , Qingyu Tan , Bosheng Ding , Liying Cheng , Jia-Wei Low , Lidong Bing , Luo Si

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Aligning general-purpose large language models (LLMs) to downstream tasks often incurs significant training adjustment costs. Prior research has explored various avenues to enhance alignment efficiency, primarily through minimal-data…

Computation and Language · Computer Science 2025-06-19 Hao Chen , Haoze Li , Zhiqing Xiao , Lirong Gao , Qi Zhang , Xiaomeng Hu , Ningtao Wang , Xing Fu , Junbo Zhao

Adapter Pruning using Tropical Characterization

Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has…

Computation and Language · Computer Science 2023-10-31 Rishabh Bhardwaj , Tushar Vaidya , Soujanya Poria

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model…

Machine Learning · Computer Science 2025-02-27 Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song , Yufa Zhou

Frustratingly Easy Task-aware Pruning for Large Language Models

Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often…

Computation and Language · Computer Science 2025-10-28 Yuanhe Tian , Junjie Liu , Xican Yang , Haishan Ye , Yan Song

Neural Network Panning: Screening the Optimal Sparse Network Before Training

Pruning on neural networks before training not only compresses the original models, but also accelerates the network training phase, which has substantial application value. The current work focuses on fine-grained pruning, which uses…

Machine Learning · Computer Science 2022-09-28 Xiatao Kang , Ping Li , Jiayi Yao , Chengxi Li