Related papers: Numerical Pruning for Efficient Autoregressive Mod…

Can pruning make Large Language Models more efficient?

Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational…

Machine Learning · Computer Science 2023-10-10 Sia Gholami , Marwan Omar

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use. Attention head pruning, which removes unnecessary attention heads in the multihead…

Computation and Language · Computer Science 2021-10-08 Kyuhong Shim , Iksoo Choi , Wonyong Sung , Jungwook Choi

Pruning with Compensation: Efficient Channel Pruning for Deep Convolutional Neural Networks

Channel pruning is a promising technique to compress the parameters of deep convolutional neural networks(DCNN) and to speed up the inference. This paper aims to address the long-standing inefficiency of channel pruning. Most channel…

Computer Vision and Pattern Recognition · Computer Science 2021-09-01 Zhouyang Xie , Yan Fu , Shengzhao Tian , Junlin Zhou , Duanbing Chen

Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads

Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of natural language processing (NLP) tasks. By learning rich language knowledge with millions of parameters, these models are usually…

Computation and Language · Computer Science 2020-11-10 Zhengyan Zhang , Fanchao Qi , Zhiyuan Liu , Qun Liu , Maosong Sun

WeightMom: Learning Sparse Networks using Iterative Momentum-based pruning

Deep Neural Networks have been used in a wide variety of applications with significant success. However, their highly complex nature owing to comprising millions of parameters has lead to problems during deployment in pipelines with low…

Machine Learning · Computer Science 2022-08-15 Elvis Johnson , Xiaochen Tang , Sriramacharyulu Samudrala

Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot…

Machine Learning · Computer Science 2023-03-16 Kaiqi Zhao , Animesh Jain , Ming Zhao

Optimizing Dense Feed-Forward Neural Networks

Deep learning models have been widely used during the last decade due to their outstanding learning and abstraction capacities. However, one of the main challenges any scientist has to face using deep learning models is to establish the…

Machine Learning · Computer Science 2025-04-22 Luis Balderas , Miguel Lastra , José M. Benítez

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the…

Machine Learning · Computer Science 2022-08-30 Marcin Pietroń , Dominik Żurek

Automatic Block-wise Pruning with Auxiliary Gating Structures for Deep Convolutional Neural Networks

Convolutional neural networks are prevailing in deep learning tasks. However, they suffer from massive cost issues when working on mobile devices. Network pruning is an effective method of model compression to handle such problems. This…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Zhaofeng Si , Honggang Qi , Xiaoyu Song

Compression of Neural Machine Translation Models via Pruning

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT…

Artificial Intelligence · Computer Science 2016-07-01 Abigail See , Minh-Thang Luong , Christopher D. Manning

Accelerating DNN Training with Structured Data Gradient Pruning

Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not…

Machine Learning · Computer Science 2022-02-03 Bradley McDanel , Helia Dinh , John Magallanes

Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning

Convolutional Neural Networks (CNNs) have a large number of parameters and take significantly large hardware resources to compute, so edge devices struggle to run high-level networks. This paper proposes a novel method to reduce the…

Computer Vision and Pattern Recognition · Computer Science 2023-01-27 Athul Shibu , Abhishek Kumar , Heechul Jung , Dong-Gyu Lee

Adaptive Activation-based Structured Pruning

Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that…

Computer Vision and Pattern Recognition · Computer Science 2023-03-13 Kaiqi Zhao , Animesh Jain , Ming Zhao

Neural Language Model Pruning for Automatic Speech Recognition

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

Dynamic Model Pruning with Feedback

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression…

Machine Learning · Computer Science 2020-06-15 Tao Lin , Sebastian U. Stich , Luis Barba , Daniil Dmitriev , Martin Jaggi

AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large…

Machine Learning · Computer Science 2019-09-12 Ning Liu , Xiaolong Ma , Zhiyuan Xu , Yanzhi Wang , Jian Tang , Jieping Ye

Elimination-compensation pruning for fully-connected neural networks

The unmatched ability of Deep Neural Networks in capturing complex patterns in large and noisy datasets is often associated with their large hypothesis space, and consequently to the vast amount of parameters that characterize model…

Machine Learning · Computer Science 2026-02-25 Enrico Ballini , Luca Muscarnera , Alessio Fumagalli , Anna Scotti , Francesco Regazzoni

Differentiable Subset Pruning of Transformer Heads

Multi-head attention, a collection of several attention mechanisms that independently attend to different parts of the input, is the key ingredient in the Transformer. Recent work has shown, however, that a large proportion of the heads in…

Computation and Language · Computer Science 2023-07-28 Jiaoda Li , Ryan Cotterell , Mrinmaya Sachan

Weight Pruning via Adaptive Sparsity Loss

Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust compressive learning…

Machine Learning · Computer Science 2020-06-05 George Retsinas , Athena Elafrou , Georgios Goumas , Petros Maragos

Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models

Given a pretrained encoder-based language model, how can we accurately compress it without retraining? Retraining-free structured pruning algorithms are crucial in pretrained language model compression due to their significantly reduced…

Computation and Language · Computer Science 2024-03-18 Seungcheol Park , Hojun Choi , U Kang