English
Related papers

Related papers: Mixed-Precision Quantization for Deep Vision Model…

200 papers

Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of…

Machine Learning · Computer Science 2025-08-06 Haidong Kang , Lianbo Ma , Guo Yu , Shangce Gao

Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Lianbo Ma , Jianlun Ma , Yuee Zhou , Guoyang Xie , Qiang He , Zhichao Lu

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation…

Computer Vision and Pattern Recognition · Computer Science 2021-10-14 Weihan Chen , Peisong Wang , Jian Cheng

Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the…

Image and Video Processing · Electrical Eng. & Systems 2025-06-03 Md Adnan Faisal Hossain , Zhihao Duan , Fengqing Zhu

Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current…

Machine Learning · Computer Science 2026-03-24 Mehmet Emre Akbulut , Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Manuel Roveri

How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation demands with minimal accuracy degradation. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Minjun Kim , Jaeri Lee , Jongjin Kim , Jeongin Yun , Yongmo Kwon , U Kang

Large Language Models (LLMs) have demonstrated remarkable success across a wide range of language tasks, but their deployment on edge devices remains challenging due to the substantial memory requirements imposed by their large parameter…

Computation and Language · Computer Science 2025-02-05 Zihan Chen , Bike Xie , Jundong Li , Cong Shen

Quantization is essential for Neural Network (NN) compression, reducing model size and computational demands by using lower bit-width data types, though aggressive reduction often hampers accuracy. Mixed Precision (MP) mitigates this…

Machine Learning · Computer Science 2025-05-20 Shmulik Markovich-Golan , Daniel Ohayon , Itay Niv , Yair Hanani

Quantization is of significance for compressing the over-parameterized deep neural models and deploying them on resource-limited devices. Fixed-precision quantization suffers from performance drop due to the limited numerical representation…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Chen Tang , Yuan Meng , Jiacheng Jiang , Shuzhao Xie , Rongwei Lu , Xinzhu Ma , Zhi Wang , Wenwu Zhu

Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. To further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision…

Machine Learning · Computer Science 2025-08-14 Zijun Jiang , Yangdi Lyu

Diffusion models (DMs) generate remarkable high quality images via the stochastic denoising process, which unfortunately incurs high sampling time. Post-quantizing the trained diffusion models in fixed bit-widths, e.g., 4 bits on weights…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Rocco Manz Maruzzelli , Basile Lewandowski , Lydia Y. Chen

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports mixed bitwidth arithmetic operations, recent research on mixed…

Machine Learning · Computer Science 2022-07-12 Xijie Huang , Zhiqiang Shen , Shichao Li , Zechun Liu , Xianghong Hu , Jeffry Wicaksana , Eric Xing , Kwang-Ting Cheng

The rapid scaling of language models (LMs) has resulted in unprecedented computational, memory, and energy requirements, making their training and deployment increasingly unsustainable. Quantization has emerged as an essential compression…

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for…

Machine Learning · Computer Science 2023-01-26 Matteo Risso , Alessio Burrello , Luca Benini , Enrico Macii , Massimo Poncino , Daniele Jahier Pagliari

Diffusion models have received wide attention in generation tasks. However, the expensive computation cost prevents the application of diffusion models in resource-constrained scenarios. Quantization emerges as a practical solution that…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Weilun Feng , Haotong Qin , Chuanguang Yang , Zhulin An , Libo Huang , Boyu Diao , Fei Wang , Renshuai Tao , Yongjun Xu , Michele Magno

Quantization is wildly taken as a model compression technique, which obtains efficient models by converting floating-point weights and activations in the neural network into lower-bit integers. Quantization has been proven to work well on…

Computer Vision and Pattern Recognition · Computer Science 2022-09-15 Lingran Zhao , Zhen Dong , Kurt Keutzer

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Large DNNs with mixed-precision quantization can achieve ultra-high compression while retaining high classification performance. However, because of the challenges in finding an accurate metric that can guide the optimization process, these…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 Souvik Kundu , Shikai Wang , Qirui Sun , Peter A. Beerel , Massoud Pedram
‹ Prev 1 2 3 10 Next ›