Related papers: EasyQuant: Post-training Quantization via Scale Op…

Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant…

Machine Learning · Computer Science 2020-12-15 Itay Hubara , Yury Nahshan , Yair Hanani , Ron Banner , Daniel Soudry

Post-training 4-bit quantization of convolution networks for rapid-deployment

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

Quantization Range Estimation for Convolutional Neural Networks

Post-training quantization for reducing the storage of deep neural network models has been demonstrated to be an effective way in various tasks. However, low-bit quantization while maintaining model accuracy is a challenging problem. In…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bingtao Yang , Yujia Wang , Mengzhi Jiao , Hongwei Huo

EfQAT: An Efficient Framework for Quantization-Aware Training

Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full…

Machine Learning · Computer Science 2024-11-19 Saleh Ashkboos , Bram Verhoef , Torsten Hoefler , Evangelos Eleftheriou , Martino Dazzi

EfficientQuant: An Efficient Post-Training Quantization for CNN-Transformer Hybrid Models on Edge Devices

Hybrid models that combine convolutional and transformer blocks offer strong performance in computer vision (CV) tasks but are resource-intensive for edge deployment. Although post-training quantization (PTQ) can help reduce resource…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Shaibal Saha , Lanyu Xu

Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware

Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of…

Machine Learning · Computer Science 2025-10-14 Lion Mueller , Alberto Garcia-Ortiz , Ardalan Najafi , Adam Fuks , Lennart Bamberg

Quantizing deep convolutional networks for efficient inference: A whitepaper

We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision…

Machine Learning · Computer Science 2018-06-22 Raghuraman Krishnamoorthi

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Quantization enables efficient acceleration of deep neural networks by reducing model memory footprint and exploiting low-cost integer math hardware units. Quantization maps floating-point weights and activations in a trained model to…

Machine Learning · Computer Science 2021-02-11 Steve Dai , Rangharajan Venkatesan , Haoxing Ren , Brian Zimmer , William J. Dally , Brucek Khailany

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements. In this work, we present an…

Computation and Language · Computer Science 2022-06-07 Zhewei Yao , Reza Yazdani Aminabadi , Minjia Zhang , Xiaoxia Wu , Conglong Li , Yuxiong He

Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference

Deep neural networks have achieved state-of-the-art results in a wide range of applications, from natural language processing and computer vision to speech recognition. However, as tasks become increasingly complex, model sizes continue to…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Tomer Gafni , Asaf Karnieli , Yair Hanani

Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines

Quantizing deep neural networks ,reducing the precision (bit-width) of their computations, can remarkably decrease memory usage and accelerate processing, making these models more suitable for large-scale medical imaging applications with…

Computer Vision and Pattern Recognition · Computer Science 2025-01-30 Chongyu Qu , Ritchie Zhao , Ye Yu , Bin Liu , Tianyuan Yao , Junchao Zhu , Bennett A. Landman , Yucheng Tang , Yuankai Huo

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF),…

Machine Learning · Statistics 2023-03-21 Alex Finkelstein , Ella Fuchs , Idan Tal , Mark Grobman , Niv Vosco , Eldad Meller

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Beacon: Post-Training Quantization with Integrated Grid Selection

Quantization is a widely used compression technique for reducing the memory and computation costs of large pre-trained models. A key challenge in per-channel post-training quantization (PTQ) is selecting appropriate scaling factors to…

Machine Learning · Computer Science 2026-02-18 Shihao Zhang , Rayan Saab

LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization…

Machine Learning · Computer Science 2025-02-11 Jung Hyun Lee , Jeonghoon Kim , June Yong Yang , Se Jung Kwon , Eunho Yang , Kang Min Yoo , Dongsoo Lee

EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization

Quantization is a key method for deploying deep neural networks on edge devices with limited memory and computation resources. Recent improvements in Post-Training Quantization (PTQ) methods were achieved by an additional local optimization…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Ofir Gordon , Elad Cohen , Hai Victor Habi , Arnon Netzer

A Data-Free Analytical Quantization Scheme for Deep Learning Models

Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Ahmed Luqman , Khuzemah Qazi , Murray Patterson , Malik Jahan Khan , Imdadullah Khan

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

Post-training quantization (PTQ) has emerged as a practical approach to compress large neural networks, making them highly efficient for deployment. However, effectively reducing these models to their low-bit counterparts without…

Machine Learning · Computer Science 2024-10-22 Aozhong Zhang , Zi Yang , Naigang Wang , Yingyong Qi , Jack Xin , Xin Li , Penghang Yin

PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhihang Yuan , Chenhao Xue , Yiqi Chen , Qiang Wu , Guangyu Sun