Related papers: Mixed-Precision Quantization for Deep Vision Model…

Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of…

Machine Learning · Computer Science 2025-08-06 Haidong Kang , Lianbo Ma , Guo Yu , Shangce Gao

Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Lianbo Ma , Jianlun Ma , Yuee Zhou , Guoyang Xie , Qiang He , Zhichao Lu

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation…

Computer Vision and Pattern Recognition · Computer Science 2021-10-14 Weihan Chen , Peisong Wang , Jian Cheng

Flexible Mixed Precision Quantization for Learned Image Compression

Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the…

Image and Video Processing · Electrical Eng. & Systems 2025-06-03 Md Adnan Faisal Hossain , Zhihao Duan , Fengqing Zhu

InfoQ: Mixed-Precision Quantization via Global Information Flow

Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current…

Machine Learning · Computer Science 2026-03-24 Mehmet Emre Akbulut , Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Manuel Roveri

LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers

How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation demands with minimal accuracy degradation. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Minjun Kim , Jaeri Lee , Jongjin Kim , Jeongin Yun , Yongmo Kwon , U Kang

Channel-Wise Mixed-Precision Quantization for Large Language Models

Large Language Models (LLMs) have demonstrated remarkable success across a wide range of language tasks, but their deployment on edge devices remains challenging due to the substantial memory requirements imposed by their large parameter…

Computation and Language · Computer Science 2025-02-05 Zihan Chen , Bike Xie , Jundong Li , Cong Shen

Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs

Quantization is essential for Neural Network (NN) compression, reducing model size and computational demands by using lower bit-width data types, though aggressive reduction often hampers accuracy. Mixed Precision (MP) mitigates this…

Machine Learning · Computer Science 2025-05-20 Shmulik Markovich-Golan , Daniel Ohayon , Itay Niv , Yair Hanani

Retraining-free Model Quantization via One-Shot Weight-Coupling Learning

Quantization is of significance for compressing the over-parameterized deep neural models and deploying them on resource-limited devices. Fixed-precision quantization suffers from performance drop due to the limited numerical representation…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Chen Tang , Yuan Meng , Jiacheng Jiang , Shuzhao Xie , Rongwei Lu , Xinzhu Ma , Zhi Wang , Wenwu Zhu

MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI

Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. To further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision…

Machine Learning · Computer Science 2025-08-14 Zijun Jiang , Yangdi Lyu

MPQ-Diff: Mixed Precision Quantization for Diffusion Models

Diffusion models (DMs) generate remarkable high quality images via the stochastic denoising process, which unfortunately incurs high sampling time. Post-quantizing the trained diffusion models in fixed bit-widths, e.g., 4 bits on weights…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Rocco Manz Maruzzelli , Basile Lewandowski , Lydia Y. Chen

Mixed-Precision Quantized Neural Network with Progressively Decreasing Bitwidth For Image Classification and Object Detection

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

SDQ: Stochastic Differentiable Quantization with Mixed Precision

In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports mixed bitwidth arithmetic operations, recent research on mixed…

Machine Learning · Computer Science 2022-07-12 Xijie Huang , Zhiqiang Shen , Shichao Li , Zechun Liu , Xianghong Hu , Jeffry Wicaksana , Eric Xing , Kwang-Ting Cheng

Mixed-Precision Quantization for Language Models: Techniques and Prospects

The rapid scaling of language models (LMs) has resulted in unprecedented computational, memory, and energy requirements, making their training and deployment increasingly unsustainable. Quantization has emerged as an essential compression…

Machine Learning · Computer Science 2025-10-21 Mariam Rakka , Marios Fournarakis , Olga Krestinskaya , Jinane Bazzi , Khaled N. Salama , Fadi Kurdahi , Ahmed M. Eltawil , Mohammed E. Fouda

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for…

Machine Learning · Computer Science 2023-01-26 Matteo Risso , Alessio Burrello , Luca Benini , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models

Diffusion models have received wide attention in generation tasks. However, the expensive computation cost prevents the application of diffusion models in resource-constrained scenarios. Quantization emerges as a practical solution that…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Weilun Feng , Haotong Qin , Chuanguang Yang , Zhulin An , Libo Huang , Boyu Diao , Fei Wang , Renshuai Tao , Yongjun Xu , Michele Magno

Analysis of Quantization on MLP-based Vision Models

Quantization is wildly taken as a model compression technique, which obtains efficient models by converting floating-point weights and activations in the neural network into lower-bit integers. Quantization has been proven to work well on…

Computer Vision and Pattern Recognition · Computer Science 2022-09-15 Lingran Zhao , Zhen Dong , Kurt Keutzer

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

BMPQ: Bit-Gradient Sensitivity Driven Mixed-Precision Quantization of DNNs from Scratch

Large DNNs with mixed-precision quantization can achieve ultra-high compression while retaining high classification performance. However, because of the challenges in finding an accurate metric that can guide the optimization process, these…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Souvik Kundu , Shikai Wang , Qirui Sun , Peter A. Beerel , Massoud Pedram