Related papers: CSMPQ:Class Separability Based Mixed-Precision Qua…

OMPQ: Orthogonal Mixed Precision Quantization

To bridge the ever increasing gap between deep neural networks' complexity and hardware capability, network quantization has attracted more and more research attention. The latest trend of mixed precision quantization takes advantage of…

Machine Learning · Computer Science 2025-10-28 Yuexiao Ma , Taisong Jin , Xiawu Zheng , Yan Wang , Huixia Li , Yongjian Wu , Guannan Jiang , Wei Zhang , Rongrong Ji

Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Lianbo Ma , Jianlun Ma , Yuee Zhou , Guoyang Xie , Qiang He , Zhichao Lu

Channel-Wise Mixed-Precision Quantization for Large Language Models

Large Language Models (LLMs) have demonstrated remarkable success across a wide range of language tasks, but their deployment on edge devices remains challenging due to the substantial memory requirements imposed by their large parameter…

Computation and Language · Computer Science 2025-02-05 Zihan Chen , Bike Xie , Jundong Li , Cong Shen

Flexible Mixed Precision Quantization for Learned Image Compression

Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the…

Image and Video Processing · Electrical Eng. & Systems 2025-06-03 Md Adnan Faisal Hossain , Zhihao Duan , Fengqing Zhu

InfoQ: Mixed-Precision Quantization via Global Information Flow

Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current…

Machine Learning · Computer Science 2026-03-24 Mehmet Emre Akbulut , Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Manuel Roveri

CPTQuant - A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models

Large language models have transformed the comprehension and generation of natural language tasks, but they come with substantial memory and computational requirements. Quantization techniques have emerged as a promising avenue for…

Computation and Language · Computer Science 2024-12-10 Amitash Nanda , Sree Bhargavi Balija , Debashis Sahoo

Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of…

Machine Learning · Computer Science 2025-08-06 Haidong Kang , Lianbo Ma , Guo Yu , Shangce Gao

Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs

Quantization is essential for Neural Network (NN) compression, reducing model size and computational demands by using lower bit-width data types, though aggressive reduction often hampers accuracy. Mixed Precision (MP) mitigates this…

Machine Learning · Computer Science 2025-05-20 Shmulik Markovich-Golan , Daniel Ohayon , Itay Niv , Yair Hanani

BMPQ: Bit-Gradient Sensitivity Driven Mixed-Precision Quantization of DNNs from Scratch

Large DNNs with mixed-precision quantization can achieve ultra-high compression while retaining high classification performance. However, because of the challenges in finding an accurate metric that can guide the optimization process, these…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Souvik Kundu , Shikai Wang , Qirui Sun , Peter A. Beerel , Massoud Pedram

Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices

While federated learning (FL) systems often utilize quantization to battle communication and computational bottlenecks, they have heretofore been limited to deploying fixed-precision quantization schemes. Meanwhile, the concept of…

Machine Learning · Computer Science 2023-12-01 Huancheng Chen , Haris Vikalo

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which…

Machine Learning · Computer Science 2023-03-07 Chen Tang , Kai Ouyang , Zhi Wang , Yifei Zhu , Yaowei Wang , Wen Ji , Wenwu Zhu

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules…

Machine Learning · Computer Science 2026-03-19 Hengyuan Zhang , Xinrong Chen , Zunhai Su , Xiao Liang , Jing Xiong , Wendong Xu , He Xiao , Chaofan Tao , Wei Zhang , Ruobing Xie , Lei Jiang , Hayden Kwok-Hay So , Ngai Wong

ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA

This work targets the commonly used FPGA (field-programmable gate array) devices as the hardware platform for DNN edge computing. We focus on DNN quantization as the main model compression technique. The novelty of this work is: We use a…

Machine Learning · Computer Science 2021-11-02 Sung-En Chang , Yanyu Li , Mengshu Sun , Yanzhi Wang , Xue Lin

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer…

Computer Vision and Pattern Recognition · Computer Science 2023-03-01 Lirui Xiao , Huanrui Yang , Zhen Dong , Kurt Keutzer , Li Du , Shanghang Zhang

SFMP: Fine-Grained, Hardware-Friendly and Search-Free Mixed-Precision Quantization for Large Language Models

Mixed-precision quantization is a promising approach for compressing large language models under tight memory budgets. However, existing mixed-precision methods typically suffer from one of two limitations: they either rely on expensive…

Machine Learning · Computer Science 2026-02-03 Xin Nie , Haicheng Zhang , Liang Dong , Beining Feng , Jinhong Weng , Guiling Sun

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization

Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match…

Computer Vision and Pattern Recognition · Computer Science 2021-03-05 Qigong Sun , Licheng Jiao , Yan Ren , Xiufang Li , Fanhua Shang , Fang Liu

Mixed-Precision Quantization for Language Models: Techniques and Prospects

The rapid scaling of language models (LMs) has resulted in unprecedented computational, memory, and energy requirements, making their training and deployment increasingly unsustainable. Quantization has emerged as an essential compression…

Machine Learning · Computer Science 2025-10-21 Mariam Rakka , Marios Fournarakis , Olga Krestinskaya , Jinane Bazzi , Khaled N. Salama , Fadi Kurdahi , Ahmed M. Eltawil , Mohammed E. Fouda

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

In this paper, we propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Ziwei Wang , Han Xiao , Jiwen Lu , Jie Zhou

SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization

Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation i.e., the policy) for each layer, especially when using large-scale datasets such as ISLVRC-2012. This limits the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-24 Chen Tang , Kai Ouyang , Zenghao Chai , Yunpeng Bai , Yuan Meng , Zhi Wang , Wenwu Zhu