English
Related papers

Related papers: Numerical Pruning for Efficient Autoregressive Mod…

200 papers

Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational…

Machine Learning · Computer Science 2023-10-10 Sia Gholami , Marwan Omar

While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use. Attention head pruning, which removes unnecessary attention heads in the multihead…

Computation and Language · Computer Science 2021-10-08 Kyuhong Shim , Iksoo Choi , Wonyong Sung , Jungwook Choi

Channel pruning is a promising technique to compress the parameters of deep convolutional neural networks(DCNN) and to speed up the inference. This paper aims to address the long-standing inefficiency of channel pruning. Most channel…

Computer Vision and Pattern Recognition · Computer Science 2021-09-01 Zhouyang Xie , Yan Fu , Shengzhao Tian , Junlin Zhou , Duanbing Chen

Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of natural language processing (NLP) tasks. By learning rich language knowledge with millions of parameters, these models are usually…

Computation and Language · Computer Science 2020-11-10 Zhengyan Zhang , Fanchao Qi , Zhiyuan Liu , Qun Liu , Maosong Sun

Deep Neural Networks have been used in a wide variety of applications with significant success. However, their highly complex nature owing to comprising millions of parameters has lead to problems during deployment in pipelines with low…

Machine Learning · Computer Science 2022-08-15 Elvis Johnson , Xiaochen Tang , Sriramacharyulu Samudrala

Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot…

Machine Learning · Computer Science 2023-03-16 Kaiqi Zhao , Animesh Jain , Ming Zhao

Deep learning models have been widely used during the last decade due to their outstanding learning and abstraction capacities. However, one of the main challenges any scientist has to face using deep learning models is to establish the…

Machine Learning · Computer Science 2025-04-22 Luis Balderas , Miguel Lastra , José M. Benítez

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the…

Machine Learning · Computer Science 2022-08-30 Marcin Pietroń , Dominik Żurek

Convolutional neural networks are prevailing in deep learning tasks. However, they suffer from massive cost issues when working on mobile devices. Network pruning is an effective method of model compression to handle such problems. This…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Zhaofeng Si , Honggang Qi , Xiaoyu Song

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT…

Artificial Intelligence · Computer Science 2016-07-01 Abigail See , Minh-Thang Luong , Christopher D. Manning

Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not…

Machine Learning · Computer Science 2022-02-03 Bradley McDanel , Helia Dinh , John Magallanes

Convolutional Neural Networks (CNNs) have a large number of parameters and take significantly large hardware resources to compute, so edge devices struggle to run high-level networks. This paper proposes a novel method to reduce the…

Computer Vision and Pattern Recognition · Computer Science 2023-01-27 Athul Shibu , Abhishek Kumar , Heechul Jung , Dong-Gyu Lee

Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that…

Computer Vision and Pattern Recognition · Computer Science 2023-03-13 Kaiqi Zhao , Animesh Jain , Ming Zhao

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression…

Machine Learning · Computer Science 2020-06-15 Tao Lin , Sebastian U. Stich , Luis Barba , Daniil Dmitriev , Martin Jaggi

Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large…

Machine Learning · Computer Science 2019-09-12 Ning Liu , Xiaolong Ma , Zhiyuan Xu , Yanzhi Wang , Jian Tang , Jieping Ye

The unmatched ability of Deep Neural Networks in capturing complex patterns in large and noisy datasets is often associated with their large hypothesis space, and consequently to the vast amount of parameters that characterize model…

Machine Learning · Computer Science 2026-02-25 Enrico Ballini , Luca Muscarnera , Alessio Fumagalli , Anna Scotti , Francesco Regazzoni

Multi-head attention, a collection of several attention mechanisms that independently attend to different parts of the input, is the key ingredient in the Transformer. Recent work has shown, however, that a large proportion of the heads in…

Computation and Language · Computer Science 2023-07-28 Jiaoda Li , Ryan Cotterell , Mrinmaya Sachan

Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust compressive learning…

Machine Learning · Computer Science 2020-06-05 George Retsinas , Athena Elafrou , Georgios Goumas , Petros Maragos

Given a pretrained encoder-based language model, how can we accurately compress it without retraining? Retraining-free structured pruning algorithms are crucial in pretrained language model compression due to their significantly reduced…

Computation and Language · Computer Science 2024-03-18 Seungcheol Park , Hojun Choi , U Kang
‹ Prev 1 2 3 10 Next ›