English
Related papers

Related papers: Integer-Only Neural Network Quantization Scheme Ba…

200 papers

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be…

Machine Learning · Computer Science 2017-12-19 Benoit Jacob , Skirmantas Kligys , Bo Chen , Menglong Zhu , Matthew Tang , Andrew Howard , Hartwig Adam , Dmitry Kalenichenko

Integer quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation, using reduced integer precision. It plays a significant role in the efficient…

Machine Learning · Computer Science 2021-01-15 Jian Li , Raziel Alvarez

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization…

Machine Learning · Computer Science 2020-04-22 Hao Wu , Patrick Judd , Xiaojie Zhang , Mikhail Isaev , Paulius Micikevicius

Recent machine learning methods use increasingly large deep neural networks to achieve state of the art results in various tasks. The gains in performance come at the cost of a substantial increase in computation and storage requirements.…

Machine Learning · Computer Science 2019-03-26 Yoni Choukroun , Eli Kravchik , Fan Yang , Pavel Kisilev

Neural network training is a memory- and compute-intensive task. Quantization, which enables low-bitwidth formats in training, can significantly mitigate the workload. To reduce quantization error, recent methods have developed new data…

Machine Learning · Computer Science 2024-11-19 Wenjin Guo , Donglai Liu , Weiying Xie , Yunsong Li , Xuefei Ning , Zihan Meng , Shulin Zeng , Jie Lei , Zhenman Fang , Yu Wang

Neural network quantization is a promising compression technique to reduce memory footprint and save energy consumption, potentially leading to real-time inference. However, there is a performance gap between quantized and full-precision…

Computer Vision and Pattern Recognition · Computer Science 2022-02-11 Qing Jin , Jian Ren , Richard Zhuang , Sumant Hanumante , Zhengang Li , Zhiyu Chen , Yanzhi Wang , Kaiyuan Yang , Sergey Tulyakov

Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and…

Computer Vision and Pattern Recognition · Computer Science 2017-06-23 Shuchang Zhou , Yuzhi Wang , He Wen , Qinyao He , Yuheng Zou

Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence,…

Machine Learning · Computer Science 2022-02-16 Eyyüb Sari , Vanessa Courville , Vahid Partovi Nia

Quantised neural networks (QNNs) shrink models and reduce inference energy through low-bit arithmetic, yet most still depend on a running statistics batch normalisation (BN) layer, preventing true integer-only deployment. Prior attempts…

Machine Learning · Computer Science 2025-12-19 Pengfei Sun , Wenyu Jiang , Piew Yoong Chee , Paul Devos , Dick Botteldooren

Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization,…

Machine Learning · Computer Science 2022-12-23 Vahid Partovi Nia , Eyyüb Sari , Vanessa Courville , Masoud Asgharian

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Amir Gholami , Sehoon Kim , Zhen Dong , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive efficient inference…

Computation and Language · Computer Science 2022-05-02 Sehoon Kim , Amir Gholami , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

Quantization is a pivotal technique for managing the growing computational and memory demands of Deep Neural Networks (DNNs). By reducing the number of bits used to represent weights and activations (typically from 32-bit Floating-Point…

Machine Learning · Computer Science 2025-12-05 Alberto Pirillo , Luca Colombo , Manuel Roveri

In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed…

Machine Learning · Computer Science 2016-12-20 Raziel Alvarez , Rohit Prabhavalkar , Anton Bakhtin

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the…

Machine Learning · Computer Science 2018-06-19 Ron Banner , Itay Hubara , Elad Hoffer , Daniel Soudry

We present PrecisionBatching, a quantized inference algorithm for speeding up neural network execution on traditional hardware platforms at low bitwidths without the need for retraining or recalibration. PrecisionBatching decomposes a…

Machine Learning · Computer Science 2020-03-03 Maximilian Lam , Zachary Yedidia , Colby Banbury , Vijay Janapa Reddi

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Integer-arithmetic-only networks have been demonstrated effective to reduce computational cost and to ensure cross-platform consistency. However, previous works usually report a decline in the inference accuracy when converting well-trained…

Computer Vision and Pattern Recognition · Computer Science 2020-06-23 Hengrui Zhao , Dong Liu , Houqiang Li
‹ Prev 1 2 3 10 Next ›