English
Related papers

Related papers: A Practical Mixed Precision Algorithm for Post-Tra…

200 papers

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

Serving large-scale machine learning (ML) models efficiently and with low latency has become challenging owing to increasing model size and complexity. Quantizing models can simultaneously reduce memory and compute requirements,…

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We…

Computer Vision and Pattern Recognition · Computer Science 2020-12-04 Linjie Yang , Qing Jin

Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with…

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved…

Machine Learning · Computer Science 2023-07-07 Georg Rutishauser , Francesco Conti , Luca Benini

We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a…

Machine Learning · Computer Science 2021-01-15 Xingchao Liu , Mao Ye , Dengyong Zhou , Qiang Liu

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

Mixed-precision networks allow for a variable bit-width quantization for every layer in the network. A major limitation of existing work is that the bit-width for each layer must be predefined during training time. This allows little…

Machine Learning · Computer Science 2021-04-01 Adrian Bulat , Georgios Tzimiropoulos

Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant…

Machine Learning · Computer Science 2020-12-15 Itay Hubara , Yury Nahshan , Yair Hanani , Ron Banner , Daniel Soudry

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that…

Machine Learning · Computer Science 2022-08-15 Mariam Rakka , Mohammed E. Fouda , Pramod Khargonekar , Fadi Kurdahi

Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths,…

Machine Learning · Computer Science 2023-07-11 Jorn Peters , Marios Fournarakis , Markus Nagel , Mart van Baalen , Tijmen Blankevoort

Post-training quantization is a representative technique for compressing neural networks, making them smaller and more efficient for deployment on edge devices. However, an inaccessible user dataset often makes it difficult to ensure the…

Machine Learning · Computer Science 2022-01-05 Donghyun Lee , Minkyoung Cho , Seungwon Lee , Joonho Song , Changkyu Choi

As an effective technique to achieve the implementation of deep neural networks in edge devices, model quantization has been successfully applied in many practical applications. No matter the methods of quantization aware training (QAT) or…

Computer Vision and Pattern Recognition · Computer Science 2021-05-05 Qigong Sun , Xiufang Li , Yan Ren , Zhongjian Huang , Xu Liu , Licheng Jiao , Fang Liu

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for…

Machine Learning · Computer Science 2023-01-26 Matteo Risso , Alessio Burrello , Luca Benini , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to…

Machine Learning · Computer Science 2020-03-18 Yuhang Li , Wei Wang , Haoli Bai , Ruihao Gong , Xin Dong , Fengwei Yu

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Bohan Zhuang , Jing Liu , Mingkui Tan , Lingqiao Liu , Ian Reid , Chunhua Shen

Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for…

Machine Learning · Computer Science 2020-09-01 Ting-Wu Chin , Pierce I-Jen Chuang , Vikas Chandra , Diana Marculescu
‹ Prev 1 2 3 10 Next ›