Related papers: Automatic Pruning for Quantized Neural Networks

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Tailin Liang , John Glossner , Lei Wang , Shaobo Shi , Xiaotong Zhang

Quantisation and Pruning for Neural Network Compression and Regularisation

Deep neural networks are typically too computationally expensive to run in real-time on consumer-grade hardware and low-powered devices. In this paper, we investigate reducing the computational and memory requirements of neural networks…

Machine Learning · Computer Science 2020-01-15 Kimessha Paupamah , Steven James , Richard Klein

Activation Density driven Energy-Efficient Pruning in Training

Neural network pruning with suitable retraining can yield networks with considerably fewer parameters than the original with comparable degrees of accuracy. Typical pruning methods require large, fully trained networks as a starting point…

Machine Learning · Computer Science 2020-10-13 Timothy Foldy-Porto , Yeshwanth Venkatesha , Priyadarshini Panda

Structured Pruning of Neural Networks with Budget-Aware Regularization

Pruning methods have shown to be effective at reducing the size of deep neural networks while keeping accuracy almost intact. Among the most effective methods are those that prune a network while training it with a sparsity prior loss and…

Neural and Evolutionary Computing · Computer Science 2019-12-20 Carl Lemaire , Andrew Achkar , Pierre-Marc Jodoin

Neural Network Pruning Through Constrained Reinforcement Learning

Network pruning reduces the size of neural networks by removing (pruning) neurons such that the performance drop is minimal. Traditional pruning approaches focus on designing metrics to quantify the usefulness of a neuron which is often…

Computer Vision and Pattern Recognition · Computer Science 2021-11-01 Shehryar Malik , Muhammad Umair Haider , Omer Iqbal , Murtaza Taj

Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular…

Machine Learning · Computer Science 2021-07-21 Benjamin Hawks , Javier Duarte , Nicholas J. Fraser , Alessandro Pappalardo , Nhan Tran , Yaman Umuroglu

Automated Pruning for Deep Neural Network Compression

In this work we present a method to improve the pruning step of the current state-of-the-art methodology to compress neural networks. The novelty of the proposed pruning technique is in its differentiability, which allows pruning to be…

Computer Vision and Pattern Recognition · Computer Science 2019-01-08 Franco Manessi , Alessandro Rozza , Simone Bianco , Paolo Napoletano , Raimondo Schettini

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory…

Machine Learning · Computer Science 2024-09-25 Beatrice Alessandra Motetti , Matteo Risso , Alessio Burrello , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

Pruning vs Quantization: Which is Better?

Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is…

Machine Learning · Computer Science 2024-02-19 Andrey Kuzmin , Markus Nagel , Mart van Baalen , Arash Behboodi , Tijmen Blankevoort

Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in…

Neural and Evolutionary Computing · Computer Science 2025-09-05 Sara Makenali , Babak Rokh , Ali Azarpeyvand

Joint Pruning & Quantization for Extremely Sparse Neural Networks

We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Po-Hsiang Yu , Sih-Sian Wu , Jan P. Klopp , Liang-Gee Chen , Shao-Yi Chien

Confident magnitude-based neural network pruning

Pruning neural networks has proven to be a successful approach to increase the efficiency and reduce the memory storage of deep learning models without compromising performance. Previous literature has shown that it is possible to achieve a…

Machine Learning · Computer Science 2024-08-12 Joaquin Alvarez

Pruning at a Glance: Global Neural Pruning for Model Compression

Deep Learning models have become the dominant approach in several areas due to their high performance. Unfortunately, the size and hence computational requirements of operating such models can be considerably high. Therefore, this…

Computer Vision and Pattern Recognition · Computer Science 2019-12-04 Abdullah Salama , Oleksiy Ostapenko , Tassilo Klein , Moin Nabi

Neural Network Compression using Binarization and Few Full-Precision Weights

Quantization and pruning are two effective Deep Neural Networks model compression methods. In this paper, we propose Automatic Prune Binarization (APB), a novel compression technique combining quantization with pruning. APB enhances the…

Computer Vision and Pattern Recognition · Computer Science 2023-09-18 Franco Maria Nardini , Cosimo Rulli , Salvatore Trani , Rossano Venturini

Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks

Resource-efficient convolution neural networks enable not only the intelligence on edge devices but also opportunities in system-level optimization such as scheduling. In this work, we aim to improve the performance of resource-constrained…

Computer Vision and Pattern Recognition · Computer Science 2018-10-19 Ting-Wu Chin , Cha Zhang , Diana Marculescu

Pruning a neural network using Bayesian inference

Neural network pruning is a highly effective technique aimed at reducing the computational and memory demands of large neural networks. In this research paper, we present a novel approach to pruning neural networks utilizing Bayesian…

Machine Learning · Statistics 2023-08-07 Sunil Mathew , Daniel B. Rowe

A Probabilistic Approach to Neural Network Pruning

Neural network pruning techniques reduce the number of parameters without compromising predicting ability of a network. Many algorithms have been developed for pruning both over-parameterized fully-connected networks (FCNs) and…

Machine Learning · Computer Science 2021-05-24 Xin Qian , Diego Klabjan

Network Automatic Pruning: Start NAP and Take a Nap

Network pruning can significantly reduce the computation and memory footprint of large neural networks. To achieve a good trade-off between model size and performance, popular pruning techniques usually rely on hand-crafted heuristics and…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Wenyuan Zeng , Yuwen Xiong , Raquel Urtasun

Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks

Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-26 Jun Nishikawa , Ryoji Ikegaya

Efficient Inference of CNNs via Channel Pruning

The deployment of Convolutional Neural Networks (CNNs) on resource constrained platforms such as mobile devices and embedded systems has been greatly hindered by their high implementation cost, and thus motivated a lot research interest in…

Computer Vision and Pattern Recognition · Computer Science 2019-08-12 Boyu Zhang , Azadeh Davoodi , Yu Hen Hu