Related papers: Automatic Joint Structured Pruning and Quantizatio…

Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations

Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often…

Machine Learning · Computer Science 2021-11-02 Xinyu Zhang , Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in…

Neural and Evolutionary Computing · Computer Science 2025-09-05 Sara Makenali , Babak Rokh , Ali Azarpeyvand

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory…

Machine Learning · Computer Science 2024-09-25 Beatrice Alessandra Motetti , Matteo Risso , Alessio Burrello , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

Automatic low-bit hybrid quantization of neural networks through meta learning

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit…

Machine Learning · Computer Science 2020-04-27 Tao Wang , Junsong Wang , Chang Xu , Chao Xue

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Tailin Liang , John Glossner , Lei Wang , Shaobo Shi , Xiaotong Zhang

FeTa: A DCA Pruning Algorithm with Generalization Error Guarantees

Recent DNN pruning algorithms have succeeded in reducing the number of parameters in fully connected layers, often with little or no drop in classification accuracy. However, most of the existing pruning schemes either have to be applied…

Machine Learning · Computer Science 2018-03-13 Konstantinos Pitas , Mike Davies , Pierre Vandergheynst

Pruning and Quantization Impact on Graph Neural Networks

Graph neural networks (GNNs) are known to operate with high accuracy on learning from graph-structured data, but they suffer from high computational and resource costs. Neural network compression methods are used to reduce the model size…

Machine Learning · Computer Science 2025-10-28 Khatoon Khedri , Reza Rawassizadeh , Qifu Wen , Mehdi Hosseinzadeh

Optimizing Deep Neural Networks using Safety-Guided Self Compression

The deployment of deep neural networks on resource-constrained devices necessitates effective model com- pression strategies that judiciously balance the reduction of model size with the preservation of performance. This study introduces a…

Machine Learning · Computer Science 2025-05-02 Mohammad Zbeeb , Mariam Salman , Mohammad Bazzi , Ammar Mohanna

Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks

Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-26 Jun Nishikawa , Ryoji Ikegaya

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach

Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is an increased demand for deploying DNNs on devices that do not have abundant resources such as memory and computation units. Recently, network compression through…

Machine Learning · Computer Science 2020-05-19 Haichuan Yang , Shupeng Gui , Yuhao Zhu , Ji Liu

Data-Efficient Structured Pruning via Submodular Optimization

Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and…

Machine Learning · Computer Science 2023-02-14 Marwa El Halabi , Suraj Srinivas , Simon Lacoste-Julien

The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks

Energy efficient implementations and deployments of Spiking neural networks (SNNs) have been of great interest due to the possibility of developing artificial systems that can achieve the computational powers and energy efficiency of the…

Machine Learning · Computer Science 2023-02-09 Clemens JS Schaefer , Pooria Taheri , Mark Horeni , Siddharth Joshi

A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods

To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed and generally fall into two categories, i.e., static regularization-based pruning and dynamic…

Machine Learning · Computer Science 2020-04-14 Tianyun Zhang , Xiaolong Ma , Zheng Zhan , Shanglin Zhou , Minghai Qin , Fei Sun , Yen-Kuang Chen , Caiwen Ding , Makan Fardad , Yanzhi Wang

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful,…

Machine Learning · Computer Science 2023-03-01 Riade Benbaki , Wenyu Chen , Xiang Meng , Hussein Hazimeh , Natalia Ponomareva , Zhe Zhao , Rahul Mazumder

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for…

Machine Learning · Computer Science 2023-12-27 Konstantinos Balaskas , Andreas Karatzas , Christos Sad , Kostas Siozios , Iraklis Anagnostopoulos , Georgios Zervakis , Jörg Henkel

Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model…

Machine Learning · Computer Science 2021-06-17 Sheng Lin , Wei Jiang , Wei Wang , Kaidi Xu , Yanzhi Wang , Shan Liu , Songnan Li

Application-Specific Component-Aware Structured Pruning of Deep Neural Networks in Control via Soft Coefficient Optimization

Deep neural networks (DNNs) offer significant flexibility and robust performance. This makes them ideal for building not only system models but also advanced neural network controllers (NNCs). However, their high complexity and…

Machine Learning · Computer Science 2025-11-14 Ganesh Sundaram , Jonas Ulmen , Amjad Haider , Daniel Görges

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model…

Machine Learning · Computer Science 2020-01-09 Xiaolong Ma , Sheng Lin , Shaokai Ye , Zhezhi He , Linfeng Zhang , Geng Yuan , Sia Huat Tan , Zhengang Li , Deliang Fan , Xuehai Qian , Xue Lin , Kaisheng Ma , Yanzhi Wang

A Closer Look at Structured Pruning for Neural Network Compression

Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of…

Machine Learning · Statistics 2019-06-10 Elliot J. Crowley , Jack Turner , Amos Storkey , Michael O'Boyle

Iteratively Training Look-Up Tables for Network Quantization

Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word…

Machine Learning · Computer Science 2023-07-19 Fabien Cardinaux , Stefan Uhlich , Kazuki Yoshiyama , Javier Alonso Garcia , Lukas Mauch , Stephen Tiedemann , Thomas Kemp , Akira Nakamura