Related papers: Pruning Ternary Quantization

Ternary Quantization: A Survey

Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to compress neural network models with faster inference and higher accuracy. Pruning and quantization…

Machine Learning · Computer Science 2023-03-06 Dan Liu , Xue Liu

Hyperspherical Quantization: Toward Smaller and More Accurate Models

Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the…

Computer Vision and Pattern Recognition · Computer Science 2022-12-27 Dan Liu , Xi Chen , Chen Ma , Xue Liu

Trained Ternary Quantization

Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose…

Machine Learning · Computer Science 2017-02-24 Chenzhuo Zhu , Song Han , Huizi Mao , William J. Dally

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

Ternary Neural Networks with Fine-Grained Quantization

We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on…

Machine Learning · Computer Science 2017-05-31 Naveen Mellempudi , Abhisek Kundu , Dheevatsa Mudigere , Dipankar Das , Bharat Kaul , Pradeep Dubey

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Tailin Liang , John Glossner , Lei Wang , Shaobo Shi , Xiaotong Zhang

Adaptive Binary-Ternary Quantization

Neural network models are resource hungry. It is difficult to deploy such deep networks on devices with limited resources, like smart wearables, cellphones, drones, and autonomous vehicles. Low bit quantization such as binary and ternary…

Machine Learning · Computer Science 2021-09-15 Ryan Razani , Grégoire Morin , Vahid Partovi Nia , Eyyüb Sari

Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in…

Neural and Evolutionary Computing · Computer Science 2025-09-05 Sara Makenali , Babak Rokh , Ali Azarpeyvand

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network…

Computer Vision and Pattern Recognition · Computer Science 2022-05-24 Peng Hu , Xi Peng , Hongyuan Zhu , Mohamed M. Sabry Aly , Jie Lin

Automated Model Compression by Jointly Applied Pruning and Quantization

In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost to meet the deployment requirements. However, such a step-wise application of pruning and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-13 Wenting Tang , Xingxing Wei , Bo Li

Automatic Pruning for Quantized Neural Networks

Neural network quantization and pruning are two techniques commonly used to reduce the computational complexity and memory footprint of these models for deployment. However, most existing pruning strategies operate on full-precision and…

Computer Vision and Pattern Recognition · Computer Science 2020-02-04 Luis Guerra , Bohan Zhuang , Ian Reid , Tom Drummond

Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)

Quantization of weights of deep neural networks (DNN) has proven to be an effective solution for the purpose of implementing DNNs on edge devices such as mobiles, ASICs and FPGAs, because they have no sufficient resources to support…

Machine Learning · Computer Science 2019-12-20 Tianyu Zhang , Lei Zhu , Qian Zhao , Kilho Shin

3DQ: Compact Quantized Neural Networks for Volumetric Whole Brain Segmentation

Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements. In this paper we propose 3DQ, a ternary quantization method, applied for the first time to 3D Fully Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Magdalini Paschali , Stefano Gasperini , Abhijit Guha Roy , Michael Y. -S. Fang , Nassir Navab

Kernel Quantization for Efficient Network Compression

This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant…

Machine Learning · Computer Science 2020-03-12 Zhongzhi Yu , Yemin Shi , Tiejun Huang , Yizhou Yu

Neural Network Compression using Binarization and Few Full-Precision Weights

Quantization and pruning are two effective Deep Neural Networks model compression methods. In this paper, we propose Automatic Prune Binarization (APB), a novel compression technique combining quantization with pruning. APB enhances the…

Computer Vision and Pattern Recognition · Computer Science 2023-09-18 Franco Maria Nardini , Cosimo Rulli , Salvatore Trani , Rossano Venturini

Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks

Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-26 Jun Nishikawa , Ryoji Ikegaya

Q-Diffusion: Quantizing Diffusion Models

Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Xiuyu Li , Yijiang Liu , Long Lian , Huanrui Yang , Zhen Dong , Daniel Kang , Shanghang Zhang , Kurt Keutzer