Related papers: QuantNet: Learning to Quantize by Learning within …

Error-aware Quantization through Noise Tempering

Quantization has become a predominant approach for model compression, enabling deployment of large models trained on GPUs onto smaller form-factor devices for inference. Quantization-aware training (QAT) optimizes model parameters with…

Machine Learning · Computer Science 2022-12-13 Zheng Wang , Juncheng B Li , Shuhui Qu , Florian Metze , Emma Strubell

Progressive Element-wise Gradient Estimation for Neural Network Quantization

Neural network quantization aims to reduce the bit-widths of weights and activations, making it a critical technique for deploying deep neural networks on resource-constrained hardware. Most Quantization-Aware Training (QAT) methods rely on…

Machine Learning · Computer Science 2025-09-03 Kaiqi Zhao

Network Quantization with Element-wise Gradient Scaling

Network quantization aims at reducing bit-widths of weights and/or activations, particularly important for implementing deep neural networks with limited hardware resources. Most methods use the straight-through estimator (STE) to train…

Computer Vision and Pattern Recognition · Computer Science 2021-04-05 Junghyup Lee , Dohyung Kim , Bumsub Ham

Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer

Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices, or cloud platforms for at-scale services. While binarization is a special case of quantization, this extreme case…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Phuoc Pham , Jacob Abraham , Jaeyong Chung

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization

Training quantized neural networks requires addressing the non-differentiable and discrete nature of the underlying optimization problem. To tackle this challenge, the straight-through estimator (STE) has become the most widely adopted…

Machine Learning · Computer Science 2025-05-26 Halyun Jeong , Jack Xin , Penghang Yin

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major…

Machine Learning · Computer Science 2020-10-28 Jianfei Chen , Yu Gai , Zhewei Yao , Michael W. Mahoney , Joseph E. Gonzalez

Quantization Networks

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Standard Deviation-Based Quantization for Deep Neural Networks

Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the…

Machine Learning · Computer Science 2022-02-28 Amir Ardakani , Arash Ardakani , Brett Meyer , James J. Clark , Warren J. Gross

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear Classification

Quantized or low-bit neural networks are attractive due to their inference efficiency. However, training deep neural networks with quantized activations involves minimizing a discontinuous and piecewise constant loss function. Such a loss…

Machine Learning · Computer Science 2021-06-15 Ziang Long , Penghang Yin , Jack Xin

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Distance-aware Quantization

We address the problem of network quantization, that is, reducing bit-widths of weights and/or activations to lighten network architectures. Quantization methods use a rounding function to map full-precision values to the nearest quantized…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Dohyung kim , Junghyup Lee , Bumsub Ham

Learning low-precision neural networks without Straight-Through Estimator(STE)

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called…

Machine Learning · Computer Science 2019-05-22 Zhi-Gang Liu , Matthew Mattina

Differentiable, Bit-shifting, and Scalable Quantization without training neural network from scratch

Quantization of neural networks provides benefits of inference in less compute and memory requirements. Previous work in quantization lack two important aspects which this work provides. First almost all previous work in quantization used a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-12 Zia Badar

Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

Quantized deep neural networks (QDNNs) are attractive due to their much lower memory storage and faster inference speed than their regular full precision counterparts. To maintain the same performance level especially at low bit-widths,…

Machine Learning · Computer Science 2019-01-08 Penghang Yin , Shuai Zhang , Jiancheng Lyu , Stanley Osher , Yingyong Qi , Jack Xin

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and…

Computer Vision and Pattern Recognition · Computer Science 2017-06-23 Shuchang Zhou , Yuzhi Wang , He Wen , Qinyao He , Yuheng Zou

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the…

Machine Learning · Computer Science 2021-02-23 Huanrui Yang , Lin Duan , Yiran Chen , Hai Li

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces…

Machine Learning · Computer Science 2021-11-16 Cheng-Chou Lan