English
Related papers

Related papers: Task-specific Compression for Multi-task Language …

200 papers

Neuron pruning is widely used to reduce the computational cost and parameter footprint of large language models, yet it remains unclear whether neurons in task-specific models contribute uniformly to task performance. In this work, we…

Model pruning technique is vital for accelerating large language models by reducing their size and computational requirements. However, the generalizability of existing pruning methods across diverse datasets and tasks remains unclear.…

Computation and Language · Computer Science 2025-03-04 Yizhuo Ding , Xinwei Sun , Yanwei Fu , Guosheng Hu

Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often…

Computation and Language · Computer Science 2025-10-28 Yuanhe Tian , Junjie Liu , Xican Yang , Haishan Ye , Yan Song

Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks. However, these models often rely on a large number of parameters. Also, they often encounter difficulties in…

Computation and Language · Computer Science 2024-06-19 Hayato Futami , Siddhant Arora , Yosuke Kashiwagi , Emiru Tsunoo , Shinji Watanabe

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of natural language processing (NLP) tasks. By learning rich language knowledge with millions of parameters, these models are usually…

Computation and Language · Computer Science 2020-11-10 Zhengyan Zhang , Fanchao Qi , Zhiyuan Liu , Qun Liu , Maosong Sun

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT…

Artificial Intelligence · Computer Science 2016-07-01 Abigail See , Minh-Thang Luong , Christopher D. Manning

Neural Network Pruning has been established as driving force in the exploration of memory and energy efficient solutions with high throughput both during training and at test time. In this paper, we introduce a novel criterion for model…

Machine Learning · Computer Science 2025-12-09 Angelos-Christos Maroudis , Sotirios Xydis

Although multi-task deep neural network (DNN) models have computation and storage benefits over individual single-task DNN models, they can be further optimized via model compression. Numerous structured pruning methods are already…

Machine Learning · Computer Science 2023-04-17 Siddhant Garg , Lijun Zhang , Hui Guan

We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural…

Computer Vision and Pattern Recognition · Computer Science 2020-01-17 Vinay Kumar Verma , Pravendra Singh , Vinay P. Namboodiri , Piyush Rai

Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by dropping structural units like layers and…

Computation and Language · Computer Science 2023-11-14 Lucio Dery , David Grangier , Awni Hannun

The sizes of pretrained language models make them challenging and expensive to use when there are multiple desired downstream tasks. In this work, we adopt recent strategies for model pruning during finetuning to explore the question of…

Computation and Language · Computer Science 2021-12-13 Patrick Xia , Richard Shin

Model compression is crucial for deployment of neural networks on devices with limited computational and memory resources. Many different methods show comparable accuracy of the compressed model and similar compression rates. However, the…

Machine Learning · Computer Science 2020-08-21 Ben Mussay , Daniel Feldman , Samson Zhou , Vladimir Braverman , Margarita Osadchy

The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a…

Computation and Language · Computer Science 2023-04-07 Guorun Wang , Jun Yang , Yaoru Sun

In this paper, we combine two-step knowledge distillation, structured pruning, truncation, and vocabulary trimming for extremely compressing multilingual encoder-only language models for low-resource languages. Our novel approach…

Computation and Language · Computer Science 2025-11-07 Daniil Gurgurov , Michal Gregor , Josef van Genabith , Simon Ostermann

Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for…

Machine Learning · Computer Science 2024-07-25 Nicholas Pochinkov , Nandi Schoots

Transformer-based language models have shown state-of-the-art performance on a variety of natural language understanding tasks. To achieve this performance, these models are first pre-trained on general corpus and then fine-tuned on…

Computation and Language · Computer Science 2024-07-15 Mohammadreza Tayaranian , Seyyed Hasan Mozafari , Brett H. Meyer , James J. Clark , Warren J. Gross

Large language models have recently achieved state of the art performance across a wide variety of natural language tasks. Meanwhile, the size of these models and their latency have significantly increased, which makes their usage costly,…

Computation and Language · Computer Science 2021-03-30 Ziheng Wang , Jeremy Wohlwend , Tao Lei

Large language models (LLMs) have revolutionized natural language processing, yet their substantial model sizes often require substantial computational resources. To preserve computing resources and accelerate inference speed, it is crucial…

Computation and Language · Computer Science 2025-06-04 Yirao Zhao , Guizhen Chen , Kenji Kawaguchi , Lidong Bing , Wenxuan Zhang

Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase…

Computation and Language · Computer Science 2021-11-11 Ofir Zafrir , Ariel Larey , Guy Boudoukh , Haihao Shen , Moshe Wasserblat
‹ Prev 1 2 3 10 Next ›