Related papers: Standard Deviation-Based Quantization for Deep Neu…

Learned Step Size Quantization

Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a…

Machine Learning · Computer Science 2020-05-08 Steven K. Esser , Jeffrey L. McKinstry , Deepika Bablani , Rathinakumar Appuswamy , Dharmendra S. Modha

Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference

Deep neural networks have achieved state-of-the-art results in a wide range of applications, from natural language processing and computer vision to speech recognition. However, as tasks become increasingly complex, model sizes continue to…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Tomer Gafni , Asaf Karnieli , Yair Hanani

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

Quantization Networks

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Quantizing deep convolutional networks for efficient inference: A whitepaper

We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision…

Machine Learning · Computer Science 2018-06-22 Raghuraman Krishnamoorthi

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Loss-aware Weight Quantization of Deep Networks

The huge size of deep networks hinders their use in small computing devices. In this paper, we consider compressing the network by weight quantization. We extend a recently proposed loss-aware weight binarization scheme to ternarization,…

Machine Learning · Computer Science 2018-05-11 Lu Hou , James T. Kwok

Differentiable, Bit-shifting, and Scalable Quantization without training neural network from scratch

Quantization of neural networks provides benefits of inference in less compute and memory requirements. Previous work in quantization lack two important aspects which this work provides. First almost all previous work in quantization used a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-12 Zia Badar

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing…

Computer Vision and Pattern Recognition · Computer Science 2020-12-29 Tuan Hoang , Thanh-Toan Do , Tam V. Nguyen , Ngai-Man Cheung

On the efficient representation and execution of deep acoustic models

In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed…

Machine Learning · Computer Science 2016-12-20 Raziel Alvarez , Rohit Prabhavalkar , Anton Bakhtin

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Bohan Zhuang , Jing Liu , Mingkui Tan , Lingqiao Liu , Ian Reid , Chunhua Shen

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Machine Learning · Computer Science 2018-05-22 Sean O. Settle , Manasa Bollavaram , Paolo D'Alberto , Elliott Delaye , Oscar Fernandez , Nicholas Fraser , Aaron Ng , Ashish Sirasao , Michael Wu

QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework

Despite the achievements of recent binarization methods on reducing the performance degradation of Binary Neural Networks (BNNs), gradient mismatching caused by the Straight-Through-Estimator (STE) still dominates quantized networks. This…

Computer Vision and Pattern Recognition · Computer Science 2020-09-11 Junjie Liu , Dongchao Wen , Deyu Wang , Wei Tao , Tse-Wei Chen , Kinya Osa , Masami Kato

FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at…

Machine Learning · Computer Science 2019-12-20 Bram-Ernst Verhoef , Nathan Laubeuf , Stefan Cosemans , Peter Debacker , Ioannis Papistas , Arindam Mallik , Diederik Verkest

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

Progressive Stochastic Binarization of Deep Networks

A plethora of recent research has focused on improving the memory footprint and inference speed of deep networks by reducing the complexity of (i) numerical representations (for example, by deterministic or stochastic quantization) and (ii)…

Machine Learning · Computer Science 2019-04-05 David Hartmann , Michael Wand

Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function

Deep neural networks have been proven to be highly effective tools in various domains, yet their computational and memory costs restrict them from being widely deployed on portable devices. The recent rapid increase of edge computing…

Neural and Evolutionary Computing · Computer Science 2023-06-01 Ayan Shymyrbay , Mohammed E. Fouda , Ahmed Eltawil