Related papers: Network Quantization with Element-wise Gradient Sc…

Progressive Element-wise Gradient Estimation for Neural Network Quantization

Neural network quantization aims to reduce the bit-widths of weights and activations, making it a critical technique for deploying deep neural networks on resource-constrained hardware. Most Quantization-Aware Training (QAT) methods rely on…

Machine Learning · Computer Science 2025-09-03 Kaiqi Zhao

QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework

Despite the achievements of recent binarization methods on reducing the performance degradation of Binary Neural Networks (BNNs), gradient mismatching caused by the Straight-Through-Estimator (STE) still dominates quantized networks. This…

Computer Vision and Pattern Recognition · Computer Science 2020-09-11 Junjie Liu , Dongchao Wen , Deyu Wang , Wei Tao , Tse-Wei Chen , Kinya Osa , Masami Kato

Error-aware Quantization through Noise Tempering

Quantization has become a predominant approach for model compression, enabling deployment of large models trained on GPUs onto smaller form-factor devices for inference. Quantization-aware training (QAT) optimizes model parameters with…

Machine Learning · Computer Science 2022-12-13 Zheng Wang , Juncheng B Li , Shuhui Qu , Florian Metze , Emma Strubell

Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear Classification

Quantized or low-bit neural networks are attractive due to their inference efficiency. However, training deep neural networks with quantized activations involves minimizing a discontinuous and piecewise constant loss function. Such a loss…

Machine Learning · Computer Science 2021-06-15 Ziang Long , Penghang Yin , Jack Xin

Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization

Training quantized neural networks requires addressing the non-differentiable and discrete nature of the underlying optimization problem. To tackle this challenge, the straight-through estimator (STE) has become the most widely adopted…

Machine Learning · Computer Science 2025-05-26 Halyun Jeong , Jack Xin , Penghang Yin

Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients

Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Dohyung Kim , Junghyup Lee , Jeimin Jeon , Jaehyeon Moon , Bumsub Ham

GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks

Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the…

Machine Learning · Computer Science 2025-11-12 Sergey Salishev , Ian Akhremchik

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets

Training activation quantized neural networks involves minimizing a piecewise constant function whose gradient vanishes almost everywhere, which is undesirable for the standard back-propagation or chain rule. An empirical way around this…

Machine Learning · Computer Science 2019-09-26 Penghang Yin , Jiancheng Lyu , Shuai Zhang , Stanley Osher , Yingyong Qi , Jack Xin

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

The discontinuous operations inherent in quantization and sparsification introduce a long-standing obstacle to backpropagation, particularly in ultra-low precision and sparse regimes. While the community has long viewed quantization as…

Machine Learning · Computer Science 2026-03-11 Chengxi Ye , Grace Chu , Yanfeng Liu , Yichi Zhang , Lukasz Lew , Li Zhang , Mark Sandler , Andrew Howard

Custom Gradient Estimators are Straight-Through Estimators in Disguise

Quantization-aware training comes with a fundamental challenge: the derivative of quantization functions such as rounding are zero almost everywhere and nonexistent elsewhere. Various differentiable approximations of quantization functions…

Machine Learning · Computer Science 2024-05-24 Matt Schoenbauer , Daniele Moro , Lukasz Lew , Andrew Howard

Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss

Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged for their deployments to resource-limited devices. Although recent studies have successfully discretized a full-precision…

Machine Learning · Computer Science 2021-09-07 Jung Hyun Lee , Jihun Yun , Sung Ju Hwang , Eunho Yang

StatQAT: Statistical Quantizer Optimization for Deep Networks

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes,…

Machine Learning · Statistics 2026-05-19 Mehmet Aktukmak , Daniel Huang , Ke Ding

Learning low-precision neural networks without Straight-Through Estimator(STE)

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called…

Machine Learning · Computer Science 2019-05-22 Zhi-Gang Liu , Matthew Mattina

Towards Efficient Training for Neural Network Quantization

Quantization reduces computation costs of neural networks but suffers from performance degeneration. Is this accuracy drop due to the reduced capacity, or inefficient training during the quantization procedure? After looking into the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Qing Jin , Linjie Yang , Zhenyu Liao

Distance-aware Quantization

We address the problem of network quantization, that is, reducing bit-widths of weights and/or activations to lighten network architectures. Quantization methods use a rounding function to map full-precision values to the nearest quantized…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Dohyung kim , Junghyup Lee , Bumsub Ham

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

EasyQuant: Post-training Quantization via Scale Optimization

The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and post-training quantization. Training-based…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Di Wu , Qi Tang , Yongle Zhao , Ming Zhang , Ying Fu , Debing Zhang

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a…

Machine Learning · Computer Science 2020-03-26 Jun Chen , Yong Liu , Hao Zhang , Shengnan Hou , Jian Yang

Distribution Adaptive INT8 Quantization for Training CNNs

Researches have demonstrated that low bit-width (e.g., INT8) quantization can be employed to accelerate the inference process. It makes the gradient quantization very promising since the backward propagation requires approximately twice…

Computer Vision and Pattern Recognition · Computer Science 2021-02-10 Kang Zhao , Sida Huang , Pan Pan , Yinghan Li , Yingya Zhang , Zhenyu Gu , Yinghui Xu

Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss

Reducing bit-widths of activations and weights of deep networks makes it efficient to compute and store them in memory, which is crucial in their deployments to resource-limited devices, such as mobile phones. However, decreasing bit-widths…

Computer Vision and Pattern Recognition · Computer Science 2018-11-26 Sangil Jung , Changyong Son , Seohyung Lee , Jinwoo Son , Youngjun Kwak , Jae-Joon Han , Sung Ju Hwang , Changkyu Choi