English
Related papers

Related papers: CSMPQ:Class Separability Based Mixed-Precision Qua…

200 papers

To bridge the ever increasing gap between deep neural networks' complexity and hardware capability, network quantization has attracted more and more research attention. The latest trend of mixed precision quantization takes advantage of…

Machine Learning · Computer Science 2025-10-28 Yuexiao Ma , Taisong Jin , Xiawu Zheng , Yan Wang , Huixia Li , Yongjian Wu , Guannan Jiang , Wei Zhang , Rongrong Ji

Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Lianbo Ma , Jianlun Ma , Yuee Zhou , Guoyang Xie , Qiang He , Zhichao Lu

Large Language Models (LLMs) have demonstrated remarkable success across a wide range of language tasks, but their deployment on edge devices remains challenging due to the substantial memory requirements imposed by their large parameter…

Computation and Language · Computer Science 2025-02-05 Zihan Chen , Bike Xie , Jundong Li , Cong Shen

Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the…

Image and Video Processing · Electrical Eng. & Systems 2025-06-03 Md Adnan Faisal Hossain , Zhihao Duan , Fengqing Zhu

Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current…

Machine Learning · Computer Science 2026-03-24 Mehmet Emre Akbulut , Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Manuel Roveri

Large language models have transformed the comprehension and generation of natural language tasks, but they come with substantial memory and computational requirements. Quantization techniques have emerged as a promising avenue for…

Computation and Language · Computer Science 2024-12-10 Amitash Nanda , Sree Bhargavi Balija , Debashis Sahoo

Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of…

Machine Learning · Computer Science 2025-08-06 Haidong Kang , Lianbo Ma , Guo Yu , Shangce Gao

Quantization is essential for Neural Network (NN) compression, reducing model size and computational demands by using lower bit-width data types, though aggressive reduction often hampers accuracy. Mixed Precision (MP) mitigates this…

Machine Learning · Computer Science 2025-05-20 Shmulik Markovich-Golan , Daniel Ohayon , Itay Niv , Yair Hanani

Large DNNs with mixed-precision quantization can achieve ultra-high compression while retaining high classification performance. However, because of the challenges in finding an accurate metric that can guide the optimization process, these…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Souvik Kundu , Shikai Wang , Qirui Sun , Peter A. Beerel , Massoud Pedram

While federated learning (FL) systems often utilize quantization to battle communication and computational bottlenecks, they have heretofore been limited to deploying fixed-precision quantization schemes. Meanwhile, the concept of…

Machine Learning · Computer Science 2023-12-01 Huancheng Chen , Haris Vikalo

The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which…

Machine Learning · Computer Science 2023-03-07 Chen Tang , Kai Ouyang , Zhi Wang , Yifei Zhu , Yaowei Wang , Wen Ji , Wenwu Zhu

Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules…

This work targets the commonly used FPGA (field-programmable gate array) devices as the hardware platform for DNN edge computing. We focus on DNN quantization as the main model compression technique. The novelty of this work is: We use a…

Machine Learning · Computer Science 2021-11-02 Sung-En Chang , Yanyu Li , Mengshu Sun , Yanzhi Wang , Xue Lin

Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer…

Computer Vision and Pattern Recognition · Computer Science 2023-03-01 Lirui Xiao , Huanrui Yang , Zhen Dong , Kurt Keutzer , Li Du , Shanghang Zhang

Mixed-precision quantization is a promising approach for compressing large language models under tight memory budgets. However, existing mixed-precision methods typically suffer from one of two limitations: they either rely on expensive…

Machine Learning · Computer Science 2026-02-03 Xin Nie , Haicheng Zhang , Liang Dong , Beining Feng , Jinhong Weng , Guiling Sun

Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match…

Computer Vision and Pattern Recognition · Computer Science 2021-03-05 Qigong Sun , Licheng Jiao , Yan Ren , Xiufang Li , Fanhua Shang , Fang Liu

The rapid scaling of language models (LMs) has resulted in unprecedented computational, memory, and energy requirements, making their training and deployment increasingly unsustainable. Quantization has emerged as an essential compression…

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

In this paper, we propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy…

Computer Vision and Pattern Recognition · Computer Science 2021-08-06 Ziwei Wang , Han Xiao , Jiwen Lu , Jie Zhou

Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation i.e., the policy) for each layer, especially when using large-scale datasets such as ISLVRC-2012. This limits the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-24 Chen Tang , Kai Ouyang , Zenghao Chai , Yunpeng Bai , Yuan Meng , Zhi Wang , Wenwu Zhu
‹ Prev 1 2 3 10 Next ›