Related papers: Efficient Neural Compression with Inference-time D…

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding

The ever-growing size of neural networks poses serious challenges on resource-constrained devices, such as embedded sensors. Compression algorithms that reduce their size can mitigate these problems, provided that model performance stays…

Machine Learning · Computer Science 2025-05-27 Alexander Conzelmann , Robert Bamler

Optimized learned entropy coding parameters for practical neural-based image and video compression

Neural-based image and video codecs are significantly more power-efficient when weights and activations are quantized to low-precision integers. While there are general-purpose techniques for reducing quantization effects, large losses can…

Image and Video Processing · Electrical Eng. & Systems 2023-01-26 Amir Said , Reza Pourreza , Hoang Le

Mixed-Precision Quantized Neural Network with Progressively Decreasing Bitwidth For Image Classification and Object Detection

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for…

Machine Learning · Computer Science 2023-01-26 Matteo Risso , Alessio Burrello , Luca Benini , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory…

Machine Learning · Computer Science 2024-09-25 Beatrice Alessandra Motetti , Matteo Risso , Alessio Burrello , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

Comprehensive Comparisons of Uniform Quantization in Deep Image Compression

In deep image compression, uniform quantization is applied to latent representations obtained by using an auto-encoder architecture for reducing bits and entropy coding. Quantization is a problem encountered in the end-to-end training of…

Image and Video Processing · Electrical Eng. & Systems 2023-03-02 Koki Tsubota , Kiyoharu Aizawa

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-04 Rishabh Goyal , Joaquin Vanschoren , Victor van Acht , Stephan Nijssen

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved…

Machine Learning · Computer Science 2023-07-07 Georg Rutishauser , Francesco Conti , Luca Benini

AMED: Automatic Mixed-Precision Quantization for Edge Devices

Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance. This makes them highly appropriate for systems with limited resources and low power capacity.…

Machine Learning · Computer Science 2024-06-11 Moshe Kimhi , Tal Rozen , Avi Mendelson , Chaim Baskin

Mixed-Precision Neural Networks: A Survey

Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that…

Machine Learning · Computer Science 2022-08-15 Mariam Rakka , Mohammed E. Fouda , Pramod Khargonekar , Fadi Kurdahi

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of neural networks have…

Machine Learning · Computer Science 2023-05-25 AmirAli Abdolrashidi , Lisa Wang , Shivani Agrawal , Jonathan Malmaud , Oleg Rybakov , Chas Leichner , Lukasz Lew

A 1Mb mixed-precision quantized encoder for image classification and patch-based compression

Even if Application-Specific Integrated Circuits (ASIC) have proven to be a relevant choice for integrating inference at the edge, they are often limited in terms of applicability. In this paper, we demonstrate that an ASIC neural network…

Computer Vision and Pattern Recognition · Computer Science 2025-01-10 Van Thien Nguyen , William Guicquero , Gilles Sicard

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

Bandwidth-efficient Inference for Neural Image Compression

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In…

Computer Vision and Pattern Recognition · Computer Science 2023-09-08 Shanzhi Yin , Tongda Xu , Yongsheng Liang , Yuanyuan Wang , Yanghao Li , Yan Wang , Jingjing Liu

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

Mixed-Precision Quantization for Deep Vision Models with Integer Quadratic Programming

Quantization is a widely used technique to compress neural networks. Assigning uniform bit-widths across all layers can result in significant accuracy degradation at low precision and inefficiency at high precision. Mixed-precision…

Neural and Evolutionary Computing · Computer Science 2025-04-09 Zihao Deng , Sayeh Sharify , Xin Wang , Michael Orshansky

Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference

For efficient neural network inference, it is desirable to achieve state-of-the-art accuracy with the simplest networks requiring the least computation, memory, and power. Quantizing networks to lower precision is a powerful technique for…

Machine Learning · Computer Science 2024-01-12 Deepika Bablani , Jeffrey L. Mckinstry , Steven K. Esser , Rathinakumar Appuswamy , Dharmendra S. Modha