Related papers: Efficient Bitwidth Search for Practical Mixed Prec…

Rethinking Differentiable Search for Mixed-Precision Neural Networks

Low-precision networks, with weights and activations quantized to low bit-width, are widely used to accelerate inference on edge devices. However, current solutions are uniform, using identical bit-width for all filters. This fails to…

Machine Learning · Computer Science 2020-04-14 Zhaowei Cai , Nuno Vasconcelos

One Weight Bitwidth to Rule Them All

Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for…

Machine Learning · Computer Science 2020-09-01 Ting-Wu Chin , Pierce I-Jen Chuang , Vikas Chandra , Diana Marculescu

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Mixed-Precision Quantized Neural Network with Progressively Decreasing Bitwidth For Image Classification and Object Detection

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the…

Machine Learning · Computer Science 2021-02-23 Huanrui Yang , Lin Duan , Yiran Chen , Hai Li

Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths,…

Machine Learning · Computer Science 2023-07-11 Jorn Peters , Marios Fournarakis , Markus Nagel , Mart van Baalen , Tijmen Blankevoort

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for…

Machine Learning · Computer Science 2023-01-26 Matteo Risso , Alessio Burrello , Luca Benini , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization

Mixed-precision quantization of efficient networks often suffer from activation instability encountered in the exploration of bit selections. To address this problem, we propose a novel method called MetaMix which consists of bit selection…

Machine Learning · Computer Science 2024-04-10 Han-Byul Kim , Joo Hyung Lee , Sungjoo Yoo , Hong-Seok Kim

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

Recent work in network quantization has substantially reduced the time and space complexity of neural network inference, enabling their deployment on embedded and mobile devices with limited computational and memory resources. However,…

Computer Vision and Pattern Recognition · Computer Science 2018-12-04 Bichen Wu , Yanghan Wang , Peizhao Zhang , Yuandong Tian , Peter Vajda , Kurt Keutzer

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved…

Machine Learning · Computer Science 2023-07-07 Georg Rutishauser , Francesco Conti , Luca Benini

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Bohan Zhuang , Jing Liu , Mingkui Tan , Lingqiao Liu , Ian Reid , Chunhua Shen

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Towards Effective Low-bitwidth Convolutional Neural Networks

This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations. Optimizing a low-precision network is very challenging since the training process can easily get…

Computer Vision and Pattern Recognition · Computer Science 2021-06-05 Bohan Zhuang , Chunhua Shen , Mingkui Tan , Lingqiao Liu , Ian Reid

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We…

Computer Vision and Pattern Recognition · Computer Science 2020-12-04 Linjie Yang , Qing Jin

Searching for Accurate Binary Neural Architectures

Binary neural networks have attracted tremendous attention due to the efficiency for deploying them on mobile devices. Since the weak expression ability of binary weights and features, their accuracy is usually much lower than that of…

Machine Learning · Computer Science 2019-09-18 Mingzhu Shen , Kai Han , Chunjing Xu , Yunhe Wang

Bayesian Bits: Unifying Quantization and Pruning

We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers…

Machine Learning · Computer Science 2020-10-28 Mart van Baalen , Christos Louizos , Markus Nagel , Rana Ali Amjad , Ying Wang , Tijmen Blankevoort , Max Welling

Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data…

Image and Video Processing · Electrical Eng. & Systems 2022-07-20 Yu-Shan Tai , Cheng-Yang Chang , Chieh-Fang Teng , AnYeu , Wu

Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference

For efficient neural network inference, it is desirable to achieve state-of-the-art accuracy with the simplest networks requiring the least computation, memory, and power. Quantizing networks to lower precision is a powerful technique for…

Machine Learning · Computer Science 2024-01-12 Deepika Bablani , Jeffrey L. Mckinstry , Steven K. Esser , Rathinakumar Appuswamy , Dharmendra S. Modha