Related papers: Improving Post Training Neural Quantization: Layer…

Post-training 4-bit quantization of convolution networks for rapid-deployment

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Quantization Range Estimation for Convolutional Neural Networks

Post-training quantization for reducing the storage of deep neural network models has been demonstrated to be an effective way in various tasks. However, low-bit quantization while maintaining model accuracy is a challenging problem. In…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bingtao Yang , Yujia Wang , Mengzhi Jiao , Hongwei Huo

MetaAug: Meta-Data Augmentation for Post-Training Quantization

Post-Training Quantization (PTQ) has received significant attention because it requires only a small set of calibration data to quantize a full-precision model, which is more practical in real-world applications in which full access to a…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Cuong Pham , Hoang Anh Dung , Cuong C. Nguyen , Trung Le , Dinh Phung , Gustavo Carneiro , Thanh-Toan Do

EasyQuant: Post-training Quantization via Scale Optimization

The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and post-training quantization. Training-based…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Di Wu , Qi Tang , Yongle Zhao , Ming Zhang , Ying Fu , Debing Zhang

Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients

Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Dohyung Kim , Junghyup Lee , Jeimin Jeon , Jaehyeon Moon , Bumsub Ham

Interactions Across Blocks in Post-Training Quantization of Large Language Models

Post-training quantization is widely employed to reduce the computational demands of neural networks. Typically, individual substructures, such as layers or blocks of layers, are quantized with the objective of minimizing quantization…

Machine Learning · Computer Science 2024-11-07 Khasmamad Shabanovi , Lukas Wiest , Vladimir Golkov , Daniel Cremers , Thomas Pfeil

Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low numerical precision. This can be achieved either through post-training quantization…

Machine Learning · Computer Science 2025-04-21 Zifei Xu , Sayeh Sharify , Wanzin Yazar , Tristan Webb , Xin Wang

Bag of Tricks with Quantized Convolutional Neural Networks for image classification

Deep neural networks have been proven effective in a wide range of tasks. However, their high computational and memory costs make them impractical to deploy on resource-constrained devices. To address this issue, quantization schemes have…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Jie Hu , Mengze Zeng , Enhua Wu

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes…

Machine Learning · Computer Science 2024-06-11 Brian Chmiel , Ron Banner , Elad Hoffer , Hilla Ben Yaacov , Daniel Soudry

Scalable Methods for 8-bit Training of Neural Networks

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the…

Machine Learning · Computer Science 2018-06-19 Ron Banner , Itay Hubara , Elad Hoffer , Daniel Soudry

QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

Transformer-based models have gained widespread popularity in both the computer vision (CV) and natural language processing (NLP) fields. However, significant challenges arise during post-training linear quantization, leading to noticeable…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Jiun-Man Chen , Yu-Hsuan Chao , Yu-Jie Wang , Ming-Der Shieh , Chih-Chung Hsu , Wei-Fen Lin

A Data-Free Analytical Quantization Scheme for Deep Learning Models

Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Ahmed Luqman , Khuzemah Qazi , Murray Patterson , Malik Jahan Khan , Imdadullah Khan

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

Post-training quantization (PTQ) has emerged as a practical approach to compress large neural networks, making them highly efficient for deployment. However, effectively reducing these models to their low-bit counterparts without…

Machine Learning · Computer Science 2024-10-22 Aozhong Zhang , Zi Yang , Naigang Wang , Yingyong Qi , Jack Xin , Xin Li , Penghang Yin

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

Post-Training Quantization for Vision Transformer

Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting…

Computer Vision and Pattern Recognition · Computer Science 2021-06-29 Zhenhua Liu , Yunhe Wang , Kai Han , Siwei Ma , Wen Gao

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are…

Machine Learning · Computer Science 2021-05-06 Byeongwook Kim , Dongsoo Lee , Yeonju Ro , Yongkweon Jeon , Se Jung Kwon , Baeseong Park , Daehwan Oh

Efficiently Training A Flat Neural Network Before It has been Quantizated

Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the relationship between a well-trained NN and the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Peng Xia , Junbiao Pang , Tianyang Cai

Probabilistic Calibration by Design for Neural Network Regression

Generating calibrated and sharp neural network predictive distributions for regression problems is essential for optimal decision-making in many real-world applications. To address the miscalibration issue of neural networks, various…

Machine Learning · Computer Science 2024-03-19 Victor Dheur , Souhaib Ben Taieb