English
Related papers

Related papers: DeepCuts: Single-Shot Interpretability based Pruni…

200 papers

Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent ``Sparsity May Cry'' (SMC) benchmark put into question the…

Computation and Language · Computer Science 2023-12-22 Eldar Kurtic , Torsten Hoefler , Dan Alistarh

This study explores the effectiveness of layer pruning for developing more efficient BERT models tailored to specific downstream tasks in low-resource languages. Our primary objective is to evaluate whether pruned BERT models can maintain…

Computation and Language · Computer Science 2025-01-03 Mayur Shirke , Amey Shembade , Madhushri Wagh , Pavan Thorat , Raviraj Joshi

Pruning is a standard technique for removing unnecessary structure from a neural network to reduce its storage footprint, computational demands, or energy consumption. Pruning can reduce the parameter-counts of many state-of-the-art neural…

Machine Learning · Computer Science 2019-07-02 Jonathan Frankle , David Bau

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To…

Machine Learning · Computer Science 2024-10-22 Yang Zhang , Yawei Li , Xinpeng Wang , Qianli Shen , Barbara Plank , Bernd Bischl , Mina Rezaei , Kenji Kawaguchi

The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We…

Computation and Language · Computer Science 2025-02-03 James Seale Smith , Chi-Heng Lin , Shikhar Tuli , Haris Jeelani , Shangqian Gao , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

Large language models (LLMs) are expensive to serve because model parameters, attention computation, and KV caches impose substantial memory and latency costs. We present GRASPrune, a structured pruning framework applied after pretraining…

Artificial Intelligence · Computer Science 2026-04-22 Ziyang Wang , Jiangfeng Xiao , Chuan Xiao , Ruoxiang Li , Rui Mao , Jianbin Qin

We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next…

Machine Learning · Computer Science 2024-06-04 Megan Flynn , Alexander Wang , Dean Edward Alvarez , Christopher De Sa , Anil Damle

The transition from monolithic to multi-component neural architectures in advanced neural network controllers poses substantial challenges due to the high computational complexity of the latter. Conventional model compression techniques for…

Machine Learning · Computer Science 2026-01-28 Ganesh Sundaram , Jonas Ulmen , Daniel Görges

Large pretrained language models such as BERT suffer from slow inference and high memory usage, due to their huge size. Recent approaches to compressing BERT rely on iterative pruning and knowledge distillation, which, however, are often…

Computation and Language · Computer Science 2025-02-07 Hyemin Lim , Jaeyeon Lee , Dong-Wan Choi

Adapting pre-trained neural models to downstream tasks has become the standard practice for obtaining high-quality models. In this work, we propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in…

Machine Learning · Computer Science 2021-05-10 Yang Gao , Nicolo Colombo , Wei Wang

The evolving capabilities of large language models are accompanied by growing sizes and deployment costs, necessitating effective inference optimisation techniques. We propose a novel pruning method utilising centrality measures from graph…

Machine Learning · Computer Science 2024-12-02 David Hoffmann , Kailash Budhathoki , Matthaeus Kleindessner

This study examines the effectiveness of layer pruning in creating efficient Sentence BERT (SBERT) models. Our goal is to create smaller sentence embedding models that reduce complexity while maintaining strong embedding similarity. We…

Computation and Language · Computer Science 2024-09-24 Anushka Shelke , Riya Savant , Raviraj Joshi

Pre-trained universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. While effective,…

Computation and Language · Computer Science 2020-05-18 Mitchell A. Gordon , Kevin Duh , Nicholas Andrews

Transformer-based language models have become a key building block for natural language processing. While these models are extremely accurate, they can be too large and computationally intensive to run on standard deployments. A variety of…

Computation and Language · Computer Science 2022-10-19 Eldar Kurtic , Daniel Campos , Tuan Nguyen , Elias Frantar , Mark Kurtz , Benjamin Fineran , Michael Goin , Dan Alistarh

To remove redundant components of large language models (LLMs) without incurring significant computational costs, this work focuses on single-shot pruning without a retraining phase. We simplify the pruning process for Transformer-based…

Artificial Intelligence · Computer Science 2024-07-30 Jianwei Li , Yijun Dong , Qi Lei

The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the…

Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-11 Haiyan Zhao , Guodong Long

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high…

Machine Learning · Computer Science 2024-12-18 Xuan Shen , Zhao Song , Yufa Zhou , Bo Chen , Jing Liu , Ruiyi Zhang , Ryan A. Rossi , Hao Tan , Tong Yu , Xiang Chen , Yufan Zhou , Tong Sun , Pu Zhao , Yanzhi Wang , Jiuxiang Gu

The rapid increase in the parameters of deep learning models has led to significant costs, challenging computational efficiency and model interpretability. In this paper, we introduce a novel and straightforward neural network pruning…

Machine Learning · Computer Science 2023-11-23 Zhang Zhang , Ruyi Tao , Jiang Zhang

Transformer-based pre-trained language models have significantly improved the performance of various natural language processing (NLP) tasks in the recent years. While effective and prevalent, these models are usually prohibitively large…

Computation and Language · Computer Science 2022-01-19 Dongkuan Xu , Ian E. H. Yen , Jinxi Zhao , Zhibin Xiao
‹ Prev 1 2 3 10 Next ›