English
Related papers

Related papers: Quantization without Tears

200 papers

Network quantization is arguably one of the most practical network compression approaches for reducing the enormous resource consumption of modern deep neural networks. They usually require diverse and subtle design choices for specific…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Ningyuan Tang , Minghao Fu , Hao Yu , Jianxin Wu

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

With the development of deep neural networks, the size of network models becomes larger and larger. Model compression has become an urgent need for deploying these network models to mobile or embedded devices. Model quantization is a…

Machine Learning · Computer Science 2019-07-02 Wen-Pu Cai , Wu-Jun Li

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes,…

Machine Learning · Statistics 2026-05-19 Mehmet Aktukmak , Daniel Huang , Ke Ding

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit…

Machine Learning · Computer Science 2019-11-26 Markus Nagel , Mart van Baalen , Tijmen Blankevoort , Max Welling

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at…

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches. However, PTQ often leads to unacceptable performance degradation in quantized models, while QAT imposes…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Xinhao Wang , Zhiwei Lin , Zhongyu Xia , Yongtao Wang

Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Ahmed Luqman , Khuzemah Qazi , Murray Patterson , Malik Jahan Khan , Imdadullah Khan

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving…

Machine Learning · Computer Science 2023-04-06 Johannes Maly , Rayan Saab

This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained…

Computer Vision and Pattern Recognition · Computer Science 2017-08-28 Aojun Zhou , Anbang Yao , Yiwen Guo , Lin Xu , Yurong Chen

Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one…

Machine Learning · Computer Science 2022-12-13 Hai Wu , Ruifei He , Haoru Tan , Xiaojuan Qi , Kaibin Huang

Quantization of neural networks provides benefits of inference in less compute and memory requirements. Previous work in quantization lack two important aspects which this work provides. First almost all previous work in quantization used a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-12 Zia Badar

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original…

Computer Vision and Pattern Recognition · Computer Science 2022-07-05 Geon Park , Jaehong Yoon , Haiyang Zhang , Xing Zhang , Sung Ju Hwang , Yonina C. Eldar

The inherent heavy computation of deep neural networks prevents their widespread applications. A widely used method for accelerating model inference is quantization, by replacing the input operands of a network using fixed-point values.…

Computer Vision and Pattern Recognition · Computer Science 2020-05-28 Hongwei Xie , Shuo Zhang , Huanghao Ding , Yafei Song , Baitao Shao , Conggang Hu , Ling Cai , Mingyang Li

The deployment of deep neural networks on resource-constrained devices relies on quantization. While static, uniform quantization applies a fixed bit-width to all inputs, it fails to adapt to their varying complexity. Dynamic,…

Machine Learning · Computer Science 2026-03-24 Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Francesca Palermo , Diana Trojaniello , Manuel Roveri

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Tailin Liang , John Glossner , Lei Wang , Shaobo Shi , Xiaotong Zhang

Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization.…

Computer Vision and Pattern Recognition · Computer Science 2020-03-29 Yaohui Cai , Zhewei Yao , Zhen Dong , Amir Gholami , Michael W. Mahoney , Kurt Keutzer

This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens,…

Machine Learning · Computer Science 2019-01-09 Xue Geng , Jie Fu , Bin Zhao , Jie Lin , Mohamed M. Sabry Aly , Christopher Pal , Vijay Chandrasekhar

Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Ke Zhu , Yin-Yin He , Jianxin Wu
‹ Prev 1 2 3 10 Next ›