Related papers: In-Hindsight Quantization Range Estimation for Qua…

Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients

Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Dohyung Kim , Junghyup Lee , Jeimin Jeon , Jaehyeon Moon , Bumsub Ham

Quantization Range Estimation for Convolutional Neural Networks

Post-training quantization for reducing the storage of deep neural network models has been demonstrated to be an effective way in various tasks. However, low-bit quantization while maintaining model accuracy is a challenging problem. In…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bingtao Yang , Yujia Wang , Mengzhi Jiao , Hongwei Huo

Bag of Tricks with Quantized Convolutional Neural Networks for image classification

Deep neural networks have been proven effective in a wide range of tasks. However, their high computational and memory costs make them impractical to deploy on resource-constrained devices. To address this issue, quantization schemes have…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Jie Hu , Mengze Zeng , Enhua Wu

Distribution Adaptive INT8 Quantization for Training CNNs

Researches have demonstrated that low bit-width (e.g., INT8) quantization can be employed to accelerate the inference process. It makes the gradient quantization very promising since the backward propagation requires approximately twice…

Computer Vision and Pattern Recognition · Computer Science 2021-02-10 Kang Zhao , Sida Huang , Pan Pan , Yinghan Li , Yingya Zhang , Zhenyu Gu , Yinghui Xu

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Dynamic Network Quantization for Efficient Video Inference

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition. Motivated…

Computer Vision and Pattern Recognition · Computer Science 2021-08-25 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Aude Oliva , Rogerio Feris , Kate Saenko

Towards Efficient Training for Neural Network Quantization

Quantization reduces computation costs of neural networks but suffers from performance degeneration. Is this accuracy drop due to the reduced capacity, or inefficient training during the quantization procedure? After looking into the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Qing Jin , Linjie Yang , Zhenyu Liao

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major…

Machine Learning · Computer Science 2020-10-28 Jianfei Chen , Yu Gai , Zhewei Yao , Michael W. Mahoney , Joseph E. Gonzalez

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization…

Machine Learning · Computer Science 2022-10-28 Ignacio Hounie , Juan Elenter , Alejandro Ribeiro

Learning under Quantization for High-Dimensional Linear Regression

The use of low-bit quantization has emerged as an indispensable technique for enabling the efficient training of large-scale models. Despite its widespread empirical success, a rigorous theoretical understanding of its impact on learning…

Machine Learning · Statistics 2026-02-17 Dechen Zhang , Junwei Su , Difan Zou

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Machine Learning · Computer Science 2018-05-22 Sean O. Settle , Manasa Bollavaram , Paolo D'Alberto , Elliott Delaye , Oscar Fernandez , Nicholas Fraser , Aaron Ng , Ashish Sirasao , Michael Wu

Is Integer Arithmetic Enough for Deep Learning Training?

The ever-increasing computational complexity of deep learning models makes their training and deployment difficult on various cloud and edge platforms. Replacing floating-point arithmetic with low-bit integer arithmetic is a promising…

Machine Learning · Computer Science 2023-01-05 Alireza Ghaffari , Marzieh S. Tahaei , Mohammadreza Tayaranian , Masoud Asgharian , Vahid Partovi Nia

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be…

Machine Learning · Computer Science 2017-12-19 Benoit Jacob , Skirmantas Kligys , Bo Chen , Menglong Zhu , Matthew Tang , Andrew Howard , Hartwig Adam , Dmitry Kalenichenko

One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training

Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources. Traditional loss-aware quantization methods commonly use the quantized gradient to replace the…

Machine Learning · Computer Science 2024-01-31 Lianbo Ma , Yuee Zhou , Jianlun Ma , Guo Yu , Qing Li

Relaxed Quantization for Discretized Neural Networks

Neural network quantization has become an important research area due to its great impact on deployment of large models on resource constrained devices. In order to train networks that can be effectively discretized without loss of…

Machine Learning · Computer Science 2018-10-05 Christos Louizos , Matthias Reisser , Tijmen Blankevoort , Efstratios Gavves , Max Welling

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization…

Machine Learning · Computer Science 2020-04-22 Hao Wu , Patrick Judd , Xiaojie Zhang , Mikhail Isaev , Paulius Micikevicius

Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

Accurate Quantization for Gait Representation Learning

Existing deep learning methods have made significant progress in gait representation learning. Quantization can facilitate the application of gait models as a model-agnostic general compression technique. Typically, appearance-based models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 S. Tian , H. Gao , G. Hong , S. Wang , J. Wang , X. Yu , S. Zhang

Quantification using Permutation-Invariant Networks based on Histograms

Quantification, also known as class prevalence estimation, is the supervised learning task in which a model is trained to predict the prevalence of each class in a given bag of examples. This paper investigates the application of deep…

Machine Learning · Computer Science 2024-03-25 Olaya Pérez-Mon , Alejandro Moreo , Juan José del Coz , Pablo González