Related papers: Is Integer Arithmetic Enough for Deep Learning Tra…

NITI: Training Integer Neural Networks Using Integer-only Arithmetic

While integer arithmetic has been widely adopted for improved performance in deep quantized neural network inference, training remains a task primarily executed using floating point arithmetic. This is because both high dynamic range and…

Computer Vision and Pattern Recognition · Computer Science 2022-02-14 Maolin Wang , Seyedramin Rasoulinezhad , Philip H. W. Leong , Hayden K. H. So

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Machine Learning · Computer Science 2018-05-22 Sean O. Settle , Manasa Bollavaram , Paolo D'Alberto , Elliott Delaye , Oscar Fernandez , Nicholas Fraser , Aaron Ng , Ashish Sirasao , Michael Wu

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be…

Machine Learning · Computer Science 2017-12-19 Benoit Jacob , Skirmantas Kligys , Bo Chen , Menglong Zhu , Matthew Tang , Andrew Howard , Hartwig Adam , Dmitry Kalenichenko

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization…

Machine Learning · Computer Science 2020-04-22 Hao Wu , Patrick Judd , Xiaojie Zhang , Mikhail Isaev , Paulius Micikevicius

In-Hindsight Quantization Range Estimation for Quantized Training

Quantization techniques applied to the inference of deep neural networks have enabled fast and efficient execution on resource-constraint devices. The success of quantization during inference has motivated the academic community to explore…

Machine Learning · Computer Science 2021-05-11 Marios Fournarakis , Markus Nagel

Towards Efficient Verification of Quantized Neural Networks

Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying…

Machine Learning · Computer Science 2023-12-29 Pei Huang , Haoze Wu , Yuting Yang , Ieva Daukantas , Min Wu , Yedi Zhang , Clark Barrett

Towards Accurate and Efficient Sub-8-Bit Integer Training

Neural network training is a memory- and compute-intensive task. Quantization, which enables low-bitwidth formats in training, can significantly mitigate the workload. To reduce quantization error, recent methods have developed new data…

Machine Learning · Computer Science 2024-11-19 Wenjin Guo , Donglai Liu , Weiying Xie , Yunsong Li , Xuefei Ning , Zihan Meng , Shulin Zeng , Jie Lei , Zhenman Fang , Yu Wang

Bitwidth-Specific Logarithmic Arithmetic for Future Hardware-Accelerated Training

While advancements in quantization have significantly reduced the computational costs of inference in deep learning, training still predominantly relies on complex floating-point arithmetic. Low-precision fixed-point training presents a…

Machine Learning · Computer Science 2025-10-21 Hassan Hamad , Yuou Qiu , Peter A. Beerel , Keith M. Chugg

Training Quantized Nets: A Deeper Understanding

Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding low-precision model for efficient inference on such systems.…

Machine Learning · Computer Science 2017-11-15 Hao Li , Soham De , Zheng Xu , Christoph Studer , Hanan Samet , Tom Goldstein

Hadamard Domain Training with Integers for Class Incremental Quantized Learning

Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For…

Machine Learning · Computer Science 2023-10-06 Martin Schiemer , Clemens JS Schaefer , Jayden Parker Vap , Mark James Horeni , Yu Emma Wang , Juan Ye , Siddharth Joshi

Full Integer Arithmetic Online Training for Spiking Neural Networks

Spiking Neural Networks (SNNs) are promising for neuromorphic computing due to their biological plausibility and energy efficiency. However, training methods like Backpropagation Through Time (BPTT) and Real Time Recurrent Learning (RTRL)…

Neural and Evolutionary Computing · Computer Science 2025-09-09 Ismael Gomez , Guangzhi Tang

On the efficient representation and execution of deep acoustic models

In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed…

Machine Learning · Computer Science 2016-12-20 Raziel Alvarez , Rohit Prabhavalkar , Anton Bakhtin

NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks

Quantization is a pivotal technique for managing the growing computational and memory demands of Deep Neural Networks (DNNs). By reducing the number of bits used to represent weights and activations (typically from 32-bit Floating-Point…

Machine Learning · Computer Science 2025-12-05 Alberto Pirillo , Luca Colombo , Manuel Roveri

Low-bit Model Quantization for Deep Neural Networks: A Survey

With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an…

Machine Learning · Computer Science 2025-05-12 Kai Liu , Qian Zheng , Kaiwen Tao , Zhiteng Li , Haotong Qin , Wenbo Li , Yong Guo , Xianglong Liu , Linghe Kong , Guihai Chen , Yulun Zhang , Xiaokang Yang

Scaled Quantization for the Vision Transformer

Quantization using a small number of bits shows promise for reducing latency and memory usage in deep neural networks. However, most quantization methods cannot readily handle complicated functions such as exponential and square root, and…

Image and Video Processing · Electrical Eng. & Systems 2023-03-27 Yangyang Chang , Gerald E. Sobelman

Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware

Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of…

Machine Learning · Computer Science 2025-10-14 Lion Mueller , Alberto Garcia-Ortiz , Ardalan Najafi , Adam Fuks , Lennart Bamberg

Low-Precision Floating-Point Schemes for Neural Network Training

The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of…

Machine Learning · Computer Science 2018-04-17 Marc Ortiz , Adrián Cristal , Eduard Ayguadé , Marc Casas

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Amir Gholami , Sehoon Kim , Zhen Dong , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

Neural Networks with Few Multiplications

For most deep learning algorithms training is notoriously time consuming. Since most of the computation in training neural networks is typically spent on floating point multiplications, we investigate an approach to training that eliminates…

Machine Learning · Computer Science 2016-02-29 Zhouhan Lin , Matthieu Courbariaux , Roland Memisevic , Yoshua Bengio

Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients

Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Dohyung Kim , Junghyup Lee , Jeimin Jeon , Jaehyeon Moon , Bumsub Ham