Related papers: Zero-Shot Quantization via Weight-Space Arithmetic

Post-Training Quantization for Vision Transformer

Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting…

Computer Vision and Pattern Recognition · Computer Science 2021-06-29 Zhenhua Liu , Yunhe Wang , Kai Han , Siwei Ma , Wen Gao

AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers

Post-training quantization (PTQ) has emerged as a promising solution for reducing the storage and computational cost of vision transformers (ViTs). Recent advances primarily target at crafting quantizers to deal with peculiar activations…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Runqing Jiang , Ye Zhang , Longguang Wang , Pengpeng Yu , Yulan Guo

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Zero-shot Adversarial Quantization

Model quantization is a promising approach to compress deep neural networks and accelerate inference, making it possible to be deployed on mobile and edge devices. To retain the high performance of full-precision models, most existing…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Yuang Liu , Wei Zhang , Jun Wang

PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhihang Yuan , Chenhao Xue , Yiqi Chen , Qiang Wu , Guangyu Sun

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Zhikai Li , Junrui Xiao , Lianwei Yang , Qingyi Gu

Zero-Shot Dynamic Quantization for Transformer Inference

We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure,or they require an…

Computation and Language · Computer Science 2022-11-18 Yousef El-Kurdi , Jerry Quinn , Avirup Sil

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

Post-training quantization (PTQ) reduces a model's memory footprint by mapping full precision weights into low bit weights without costly retraining, but can degrade its downstream performance especially in low 2- to 3-bit settings. We…

Machine Learning · Computer Science 2025-07-18 Hanqi Xiao , Yi-Lin Sung , Elias Stengel-Eskin , Mohit Bansal

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF),…

Machine Learning · Statistics 2023-03-21 Alex Finkelstein , Ella Fuchs , Idan Tal , Mark Grobman , Niv Vosco , Eldad Meller

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Weight Group-wise Post-Training Quantization for Medical Foundation Model

Foundation models have achieved remarkable results in medical image analysis. However, its large network architecture and high computational complexity significantly impact inference speed, limiting its application on terminal medical…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Yineng Chen , Peng Huang , Aozhong Zhang , Hui Guo , Penghang Yin , Shu Hu , Shao Lin , Xin Li , Tzu-Jen Kao , Balakrishnan Prabhakaran , MingChing Chang , Xin Wang

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

The Diffusion Transformers Models (DiTs) have transitioned the network architecture from traditional UNets to transformers, demonstrating exceptional capabilities in image generation. Although DiTs have been widely applied to…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Juncan Deng , Shuaiting Li , Zeyu Wang , Hong Gu , Kedong Xu , Kejie Huang

Differentiable, Bit-shifting, and Scalable Quantization without training neural network from scratch

Quantization of neural networks provides benefits of inference in less compute and memory requirements. Previous work in quantization lack two important aspects which this work provides. First almost all previous work in quantization used a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-12 Zia Badar

Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction

Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities due to its minimal data needs and high time efficiency. However, many current methods fail to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-05 Yunshan Zhong , You Huang , Jiawei Hu , Yuxin Zhang , Rongrong Ji

Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction

Post-training quantization (PTQ) has evolved as a prominent solution for compressing complex models, which advocates a small calibration dataset and avoids end-to-end retraining. However, most existing PTQ methods employ block-wise…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Changjun Li , Runqing Jiang , Zhuo Song , Pengpeng Yu , Ye Zhang , Yulan Guo

Task-Specific Zero-shot Quantization-Aware Training for Object Detection

Quantization is a key technique to reduce network size and computational complexity by representing the network parameters with a lower precision. Traditional quantization methods rely on access to original training data, which is often…

Computer Vision and Pattern Recognition · Computer Science 2025-07-23 Changhao Li , Xinrui Chen , Ji Wang , Kang Zhao , Jianfei Chen

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Vage Egiazarian , Denis Kuznedelev , Anton Voronov , Ruslan Svirschevski , Michael Goin , Daniil Pavlov , Dan Alistarh , Dmitry Baranchuk

Efficiently Training A Flat Neural Network Before It has been Quantizated

Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the relationship between a well-trained NN and the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Peng Xia , Junbiao Pang , Tianyang Cai

Empirical Evaluation of Post-Training Quantization Methods for Language Tasks

Transformer-based architectures like BERT have achieved great success in a wide range of Natural Language tasks. Despite their decent performance, the models still have numerous parameters and high computational complexity, impeding their…

Computation and Language · Computer Science 2022-11-01 Ting Hu , Christoph Meinel , Haojin Yang

Post-Training Quantization for Video Matting

Video matting is crucial for applications such as film production and virtual reality, yet deploying its computationally intensive models on resource-constrained devices presents challenges. Quantization is a key technique for model…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Tianrui Zhu , Houyuan Chen , Ruihao Gong , Michele Magno , Haotong Qin , Kai Zhang