Related papers: Towards Efficient Training for Neural Network Quan…

PACT: Parameterized Clipping Activation for Quantized Neural Networks

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing…

Computer Vision and Pattern Recognition · Computer Science 2018-07-18 Jungwook Choi , Zhuo Wang , Swagath Venkataramani , Pierce I-Jen Chuang , Vijayalakshmi Srinivasan , Kailash Gopalakrishnan

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. In order to reduce this cost, several quantization schemes have gained attention recently with some focusing on weight…

Computer Vision and Pattern Recognition · Computer Science 2018-07-19 Jungwook Choi , Pierce I-Jen Chuang , Zhuo Wang , Swagath Venkataramani , Vijayalakshmi Srinivasan , Kailash Gopalakrishnan

Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task

Deep learning techniques have proven highly effective in image classification, but their deployment in resourceconstrained environments remains challenging due to high computational demands. Furthermore, their interpretability is of high…

Machine Learning · Computer Science 2024-12-06 Alireza Maleki , Mahsa Lavaei , Mohsen Bagheritabar , Salar Beigzad , Zahra Abadi

Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients

Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Dohyung Kim , Junghyup Lee , Jeimin Jeon , Jaehyeon Moon , Bumsub Ham

MetaGrad: Adaptive Gradient Quantization with Hypernetworks

A popular track of network compression approach is Quantization aware Training (QAT), which accelerates the forward pass during the neural network training and inference. However, not much prior efforts have been made to quantize and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Kaixin Xu , Alina Hui Xiu Lee , Ziyuan Zhao , Zhe Wang , Min Wu , Weisi Lin

Precision Neural Network Quantization via Learnable Adaptive Modules

Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Wenqiang Zhou , Zhendong Yu , Xinyu Liu , Jiaming Yang , Rong Xiao , Tao Wang , Chenwei Tang , Jiancheng Lv

BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original…

Computer Vision and Pattern Recognition · Computer Science 2022-07-05 Geon Park , Jaehong Yoon , Haiyang Zhang , Xing Zhang , Sung Ju Hwang , Yonina C. Eldar

Compute-Optimal Quantization-Aware Training

Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior…

Machine Learning · Computer Science 2026-02-27 Aleksandr Dremov , David Grangier , Angelos Katharopoulos , Awni Hannun

Saliency Assisted Quantization for Neural Networks

Deep learning methods have established a significant place in image classification. While prior research has focused on enhancing final outcomes, the opaque nature of the decision-making process in these models remains a concern for…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Elmira Mousa Rezabeyk , Salar Beigzad , Yasin Hamzavi , Mohsen Bagheritabar , Seyedeh Sogol Mirikhoozani

Error-aware Quantization through Noise Tempering

Quantization has become a predominant approach for model compression, enabling deployment of large models trained on GPUs onto smaller form-factor devices for inference. Quantization-aware training (QAT) optimizes model parameters with…

Machine Learning · Computer Science 2022-12-13 Zheng Wang , Juncheng B Li , Shuhui Qu , Florian Metze , Emma Strubell

Overcoming Oscillations in Quantization-Aware Training

When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training (QAT) are…

Machine Learning · Computer Science 2022-06-30 Markus Nagel , Marios Fournarakis , Yelysei Bondarenko , Tijmen Blankevoort

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization…

Machine Learning · Computer Science 2022-10-28 Ignacio Hounie , Juan Elenter , Alejandro Ribeiro

Scalable Methods for 8-bit Training of Neural Networks

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the…

Machine Learning · Computer Science 2018-06-19 Ron Banner , Itay Hubara , Elad Hoffer , Daniel Soudry

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models.…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-26 David Qiu , David Rim , Shaojin Ding , Oleg Rybakov , Yanzhang He

Fast Adjustable Threshold For Uniform Neural Network Quantization (Winning solution of LPIRC-II)

Neural network quantization procedure is the necessary step for porting of neural networks to mobile devices. Quantization allows accelerating the inference, reducing memory consumption and model size. It can be performed without…

Machine Learning · Computer Science 2019-06-27 Alexander Goncharenko , Andrey Denisov , Sergey Alyamkin , Evgeny Terentev

Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing…

Neural and Evolutionary Computing · Computer Science 2016-09-23 Itay Hubara , Matthieu Courbariaux , Daniel Soudry , Ran El-Yaniv , Yoshua Bengio

A White Paper on Neural Network Quantization

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers. Recent emerged quantization technique has been applied to inference of deep neural networks for fast and efficient execution. However,…

Machine Learning · Computer Science 2020-03-06 Xishan Zhang , Shaoli Liu , Rui Zhang , Chang Liu , Di Huang , Shiyi Zhou , Jiaming Guo , Yu Kang , Qi Guo , Zidong Du , Yunji Chen

Gradient-Free Training of Quantized Neural Networks

Training neural networks requires significant computational resources and energy. Methods like mixed-precision and quantization-aware training reduce bit usage, yet they still depend heavily on computationally expensive gradient-based…

Machine Learning · Computer Science 2025-09-30 Noa Cohen , Omkar Joglekar , Dotan Di Castro , Vladimir Tchuiev , Shir Kozlovsky , Michal Moshkovitz