Related papers: Learning to Quantize Deep Networks by Optimizing Q…

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Bohan Zhuang , Jing Liu , Mingkui Tan , Lingqiao Liu , Ian Reid , Chunhua Shen

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing…

Computer Vision and Pattern Recognition · Computer Science 2020-12-29 Tuan Hoang , Thanh-Toan Do , Tam V. Nguyen , Ngai-Man Cheung

Quantization Networks

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Learned Step Size Quantization

Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a…

Machine Learning · Computer Science 2020-05-08 Steven K. Esser , Jeffrey L. McKinstry , Deepika Bablani , Rathinakumar Appuswamy , Dharmendra S. Modha

Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric Quantizer

Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices, or cloud platforms for at-scale services. While binarization is a special case of quantization, this extreme case…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Phuoc Pham , Jacob Abraham , Jaeyong Chung

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients

Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Dohyung Kim , Junghyup Lee , Jeimin Jeon , Jaehyeon Moon , Bumsub Ham

WaveQ: Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization

As deep neural networks make their ways into different domains, their compute efficiency is becoming a first-order constraint. Deep quantization, which reduces the bitwidth of the operations (below 8 bits), offers a unique opportunity as it…

Machine Learning · Computer Science 2020-04-27 Ahmed T. Elthakeb , Prannoy Pilligundla , Fatemehsadat Mireshghallah , Tarek Elgindi , Charles-Alban Deledalle , Hadi Esmaeilzadeh

FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at…

Machine Learning · Computer Science 2019-12-20 Bram-Ernst Verhoef , Nathan Laubeuf , Stefan Cosemans , Peter Debacker , Ioannis Papistas , Arindam Mallik , Diederik Verkest

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing…

Neural and Evolutionary Computing · Computer Science 2016-09-23 Itay Hubara , Matthieu Courbariaux , Daniel Soudry , Ran El-Yaniv , Yoshua Bengio

Quantizing deep convolutional networks for efficient inference: A whitepaper

We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision…

Machine Learning · Computer Science 2018-06-22 Raghuraman Krishnamoorthi

Bag of Tricks with Quantized Convolutional Neural Networks for image classification

Deep neural networks have been proven effective in a wide range of tasks. However, their high computational and memory costs make them impractical to deploy on resource-constrained devices. To address this issue, quantization schemes have…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Jie Hu , Mengze Zeng , Enhua Wu

Adaptive Loss-aware Quantization for Multi-bit Networks

We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on…

Computer Vision and Pattern Recognition · Computer Science 2020-07-07 Zhongnan Qu , Zimu Zhou , Yun Cheng , Lothar Thiele

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces…

Machine Learning · Computer Science 2021-11-16 Cheng-Chou Lan

Standard Deviation-Based Quantization for Deep Neural Networks

Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the…

Machine Learning · Computer Science 2022-02-28 Amir Ardakani , Arash Ardakani , Brett Meyer , James J. Clark , Warren J. Gross

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

Deep Neural Networks (DNNs) typically require massive amount of computation resource in inference tasks for computer vision applications. Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of…

Machine Learning · Computer Science 2020-04-17 Ahmed T. Elthakeb , Prannoy Pilligundla , FatemehSadat Mireshghallah , Amir Yazdanbakhsh , Hadi Esmaeilzadeh

DNQ: Dynamic Network Quantization

Network quantization is an effective method for the deployment of neural networks on memory and energy constrained mobile devices. In this paper, we propose a Dynamic Network Quantization (DNQ) framework which is composed of two modules: a…

Machine Learning · Computer Science 2018-12-07 Yuhui Xu , Shuai Zhang , Yingyong Qi , Jiaxian Guo , Weiyao Lin , Hongkai Xiong

Scalable Methods for 8-bit Training of Neural Networks

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the…

Machine Learning · Computer Science 2018-06-19 Ron Banner , Itay Hubara , Elad Hoffer , Daniel Soudry

Towards Accurate and Efficient Sub-8-Bit Integer Training

Neural network training is a memory- and compute-intensive task. Quantization, which enables low-bitwidth formats in training, can significantly mitigate the workload. To reduce quantization error, recent methods have developed new data…

Machine Learning · Computer Science 2024-11-19 Wenjin Guo , Donglai Liu , Weiying Xie , Yunsong Li , Xuefei Ning , Zihan Meng , Shulin Zeng , Jie Lei , Zhenman Fang , Yu Wang