Related papers: MixQuant: Mixed Precision Quantization with a Bit-…

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the…

Machine Learning · Computer Science 2021-02-23 Huanrui Yang , Lin Duan , Yiran Chen , Hai Li

Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of…

Machine Learning · Computer Science 2025-08-06 Haidong Kang , Lianbo Ma , Guo Yu , Shangce Gao

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation…

Computer Vision and Pattern Recognition · Computer Science 2021-10-14 Weihan Chen , Peisong Wang , Jian Cheng

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for…

Machine Learning · Computer Science 2023-01-26 Matteo Risso , Alessio Burrello , Luca Benini , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We…

Computer Vision and Pattern Recognition · Computer Science 2020-12-04 Linjie Yang , Qing Jin

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Retraining-free Model Quantization via One-Shot Weight-Coupling Learning

Quantization is of significance for compressing the over-parameterized deep neural models and deploying them on resource-limited devices. Fixed-precision quantization suffers from performance drop due to the limited numerical representation…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Chen Tang , Yuan Meng , Jiacheng Jiang , Shuzhao Xie , Rongwei Lu , Xinzhu Ma , Zhi Wang , Wenwu Zhu

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved…

Machine Learning · Computer Science 2023-07-07 Georg Rutishauser , Francesco Conti , Luca Benini

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model…

Machine Learning · Computer Science 2024-04-29 Cédric Gernigon , Silviu-Ioan Filip , Olivier Sentieys , Clément Coggiola , Mickael Bruno

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

Recent work in network quantization has substantially reduced the time and space complexity of neural network inference, enabling their deployment on embedded and mobile devices with limited computational and memory resources. However,…

Computer Vision and Pattern Recognition · Computer Science 2018-12-04 Bichen Wu , Yanghan Wang , Peizhao Zhang , Yuandong Tian , Peter Vajda , Kurt Keutzer

Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement Learning

Mixed-precision quantization mostly predetermines the model bit-width settings before actual training due to the non-differential bit-width sampling process, obtaining sub-optimal performance. Worse still, the conventional static…

Artificial Intelligence · Computer Science 2023-02-10 Yingchun Wang , Jingcai Guo , Song Guo , Weizhan Zhang

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths,…

Machine Learning · Computer Science 2023-07-11 Jorn Peters , Marios Fournarakis , Markus Nagel , Mart van Baalen , Tijmen Blankevoort

MSQ: Memory-Efficient Bit Sparsification Quantization

As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and…

Machine Learning · Computer Science 2025-07-31 Seokho Han , Seoyeon Yoon , Jinhee Kim , Dongwei Wang , Kang Eun Jeon , Huanrui Yang , Jong Hwan Ko

SDQ: Stochastic Differentiable Quantization with Mixed Precision

In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports mixed bitwidth arithmetic operations, recent research on mixed…

Machine Learning · Computer Science 2022-07-12 Xijie Huang , Zhiqiang Shen , Shichao Li , Zechun Liu , Xianghong Hu , Jeffry Wicaksana , Eric Xing , Kwang-Ting Cheng

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Lianbo Ma , Jianlun Ma , Yuee Zhou , Guoyang Xie , Qiang He , Zhichao Lu

DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory

Quantizing the weights of a neural network has two steps: (1) Finding a good low bit-complexity representation for weights (which we call the quantization grid) and (2) Rounding the original weights to values in the quantization grid. In…

Machine Learning · Computer Science 2025-01-14 Jerry Chee , Arturs Backurs , Rainie Heck , Li Zhang , Janardhan Kulkarni , Thomas Rothvoss , Sivakanth Gopi

Mixed-Precision Quantization for Deep Vision Models with Integer Quadratic Programming

Quantization is a widely used technique to compress neural networks. Assigning uniform bit-widths across all layers can result in significant accuracy degradation at low precision and inefficiency at high precision. Mixed-precision…

Neural and Evolutionary Computing · Computer Science 2025-04-09 Zihao Deng , Sayeh Sharify , Xin Wang , Michael Orshansky

A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computations, which leads to high memory…

Machine Learning · Computer Science 2023-12-20 Babak Rokh , Ali Azarpeyvand , Alireza Khanteymoori