English
Related papers

Related papers: In-Hindsight Quantization Range Estimation for Qua…

200 papers

Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Dohyung Kim , Junghyup Lee , Jeimin Jeon , Jaehyeon Moon , Bumsub Ham

Post-training quantization for reducing the storage of deep neural network models has been demonstrated to be an effective way in various tasks. However, low-bit quantization while maintaining model accuracy is a challenging problem. In…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bingtao Yang , Yujia Wang , Mengzhi Jiao , Hongwei Huo

Deep neural networks have been proven effective in a wide range of tasks. However, their high computational and memory costs make them impractical to deploy on resource-constrained devices. To address this issue, quantization schemes have…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Jie Hu , Mengze Zeng , Enhua Wu

Researches have demonstrated that low bit-width (e.g., INT8) quantization can be employed to accelerate the inference process. It makes the gradient quantization very promising since the backward propagation requires approximately twice…

Computer Vision and Pattern Recognition · Computer Science 2021-02-10 Kang Zhao , Sida Huang , Pan Pan , Yinghan Li , Yingya Zhang , Zhenyu Gu , Yinghui Xu

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition. Motivated…

Computer Vision and Pattern Recognition · Computer Science 2021-08-25 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Aude Oliva , Rogerio Feris , Kate Saenko

Quantization reduces computation costs of neural networks but suffers from performance degeneration. Is this accuracy drop due to the reduced capacity, or inefficient training during the quantization procedure? After looking into the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Qing Jin , Linjie Yang , Zhenyu Liao

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major…

Machine Learning · Computer Science 2020-10-28 Jianfei Chen , Yu Gai , Zhewei Yao , Michael W. Mahoney , Joseph E. Gonzalez

Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization…

Machine Learning · Computer Science 2022-10-28 Ignacio Hounie , Juan Elenter , Alejandro Ribeiro

The use of low-bit quantization has emerged as an indispensable technique for enabling the efficient training of large-scale models. Despite its widespread empirical success, a rigorous theoretical understanding of its impact on learning…

Machine Learning · Statistics 2026-02-17 Dechen Zhang , Junwei Su , Difan Zou

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

The ever-increasing computational complexity of deep learning models makes their training and deployment difficult on various cloud and edge platforms. Replacing floating-point arithmetic with low-bit integer arithmetic is a promising…

Machine Learning · Computer Science 2023-01-05 Alireza Ghaffari , Marzieh S. Tahaei , Mohammadreza Tayaranian , Masoud Asgharian , Vahid Partovi Nia

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be…

Machine Learning · Computer Science 2017-12-19 Benoit Jacob , Skirmantas Kligys , Bo Chen , Menglong Zhu , Matthew Tang , Andrew Howard , Hartwig Adam , Dmitry Kalenichenko

Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources. Traditional loss-aware quantization methods commonly use the quantized gradient to replace the…

Machine Learning · Computer Science 2024-01-31 Lianbo Ma , Yuee Zhou , Jianlun Ma , Guo Yu , Qing Li

Neural network quantization has become an important research area due to its great impact on deployment of large models on resource constrained devices. In order to train networks that can be effectively discretized without loss of…

Machine Learning · Computer Science 2018-10-05 Christos Louizos , Matthias Reisser , Tijmen Blankevoort , Efstratios Gavves , Max Welling

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization…

Machine Learning · Computer Science 2020-04-22 Hao Wu , Patrick Judd , Xiaojie Zhang , Mikhail Isaev , Paulius Micikevicius

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

Existing deep learning methods have made significant progress in gait representation learning. Quantization can facilitate the application of gait models as a model-agnostic general compression technique. Typically, appearance-based models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 S. Tian , H. Gao , G. Hong , S. Wang , J. Wang , X. Yu , S. Zhang

Quantification, also known as class prevalence estimation, is the supervised learning task in which a model is trained to predict the prevalence of each class in a given bag of examples. This paper investigates the application of deep…

Machine Learning · Computer Science 2024-03-25 Olaya Pérez-Mon , Alejandro Moreo , Juan José del Coz , Pablo González
‹ Prev 1 2 3 10 Next ›