Related papers: Physics Inspired Criterion for Pruning-Quantizatio…

Differentiable Joint Pruning and Quantization for Hardware Efficiency

We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware…

Machine Learning · Computer Science 2021-04-06 Ying Wang , Yadong Lu , Tijmen Blankevoort

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation

As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a…

Machine Learning · Computer Science 2021-06-29 Jangho Kim , Simyung Chang , Nojun Kwak

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To…

Machine Learning · Computer Science 2020-06-16 Tianzhe Wang , Kuan Wang , Han Cai , Ji Lin , Zhijian Liu , Song Han

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in…

Neural and Evolutionary Computing · Computer Science 2025-09-05 Sara Makenali , Babak Rokh , Ali Azarpeyvand

CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks

While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression parameters. This reliance adds engineering…

Machine Learning · Computer Science 2025-12-16 Jonathan Wenshøj , Tong Chen , Bob Pepin , Raghavendra Selvan

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network…

Computer Vision and Pattern Recognition · Computer Science 2022-05-24 Peng Hu , Xi Peng , Hongyuan Zhu , Mohamed M. Sabry Aly , Jie Lin

Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning

The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the…

Machine Learning · Computer Science 2021-03-15 Seul-Ki Yeom , Philipp Seegerer , Sebastian Lapuschkin , Alexander Binder , Simon Wiedemann , Klaus-Robert Müller , Wojciech Samek

Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks

Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-26 Jun Nishikawa , Ryoji Ikegaya

Pruning Ternary Quantization

Inference time, model size, and accuracy are three key factors in deep model compression. Most of the existing work addresses these three key factors separately as it is difficult to optimize them all at the same time. For example, low-bit…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Dan Liu , Xi Chen , Jie Fu , Chen Ma , Xue Liu

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

Physics-Inspired Binary Neural Networks: Interpretable Compression with Theoretical Guarantees

Why rely on dense neural networks and then blindly sparsify them when prior knowledge about the problem structure is already available? Many inverse problems admit algorithm-unrolled networks that naturally encode physics and sparsity. In…

Machine Learning · Computer Science 2025-10-14 Arian Eamaz , Farhang Yeganegi , Mojtaba Soltanalian

Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression

Quantizing a floating-point neural network to its fixed-point representation is crucial for Learned Image Compression (LIC) because it improves decoding consistency for interoperability and reduces space-time complexity for implementation.…

Image and Video Processing · Electrical Eng. & Systems 2023-10-10 Junqi Shi , Ming Lu , Zhan Ma

Joint Quantization and Pruning Neural Networks Approach: A Case Study on FSO Receivers

Towards fast, hardware-efficient, and low-complexity receivers, we propose a compression-aware learning approach and examine it on free-space optical (FSO) receivers for turbulence mitigation. The learning approach jointly quantize, prune,…

Signal Processing · Electrical Eng. & Systems 2026-01-13 Mohanad Obeed , Ming Jian

DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding

Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency…

Image and Video Processing · Electrical Eng. & Systems 2025-11-04 Jooyoung Lee , Se Yoon Jeong , Munchurl Kim

Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations

Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often…

Machine Learning · Computer Science 2021-11-02 Xinyu Zhang , Ian Colbert , Ken Kreutz-Delgado , Srinjoy Das

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory…

Machine Learning · Computer Science 2024-09-25 Beatrice Alessandra Motetti , Matteo Risso , Alessio Burrello , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

Pruning Deep Neural Networks from a Sparsity Perspective

In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights,…

Machine Learning · Computer Science 2023-08-24 Enmao Diao , Ganghua Wang , Jiawei Zhan , Yuhong Yang , Jie Ding , Vahid Tarokh

Unified Stochastic Framework for Neural Network Quantization and Pruning

Quantization and pruning are two essential techniques for compressing neural networks, yet they are often treated independently, with limited theoretical analysis connecting them. This paper introduces a unified framework for post-training…

Machine Learning · Computer Science 2025-05-21 Haoyu Zhang , Rayan Saab

Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at…

Computer Vision and Pattern Recognition · Computer Science 2019-09-27 Yochai Zur , Chaim Baskin , Evgenii Zheltonozhskii , Brian Chmiel , Itay Evron , Alex M. Bronstein , Avi Mendelson