Related papers: PD-Quant: Post-Training Quantization based on Pred…

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

PTQD: Accurate Post-Training Quantization for Diffusion Models

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Yefei He , Luping Liu , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods…

Machine Learning · Computer Science 2023-03-24 Zhihang Yuan , Jiawei Liu , Jiaxiang Wu , Dawei Yang , Qiang Wu , Guangyu Sun , Wenyu Liu , Xinggang Wang , Bingzhe Wu

MetaAug: Meta-Data Augmentation for Post-Training Quantization

Post-Training Quantization (PTQ) has received significant attention because it requires only a small set of calibration data to quantize a full-precision model, which is more practical in real-world applications in which full access to a…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Cuong Pham , Hoang Anh Dung , Cuong C. Nguyen , Trung Le , Dinh Phung , Gustavo Carneiro , Thanh-Toan Do

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chen Lin , Zheyang Li , Bo Peng , Haoji Hu , Wenming Tan , Ye Ren , Shiliang Pu

Post-Training Quantization in Brain-Computer Interfaces based on Event-Related Potential Detection

Post-training quantization (PTQ) is a technique used to optimize and reduce the memory footprint and computational requirements of machine learning models. It has been used primarily for neural networks. For Brain-Computer Interfaces (BCI)…

Human-Computer Interaction · Computer Science 2024-10-11 Hubert Cecotti , Dalvir Dhaliwal , Hardip Singh , Yogesh Kumar Meena

Attention Round for Post-Training Quantization

At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the…

Machine Learning · Computer Science 2022-07-08 Huabin Diao , Gongyan Li , Shaoyun Xu , Yuexing Hao

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

Post-training quantization (PTQ) has emerged as a practical approach to compress large neural networks, making them highly efficient for deployment. However, effectively reducing these models to their low-bit counterparts without…

Machine Learning · Computer Science 2024-10-22 Aozhong Zhang , Zi Yang , Naigang Wang , Yingyong Qi , Jack Xin , Xin Li , Penghang Yin

Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression

Quantizing a floating-point neural network to its fixed-point representation is crucial for Learned Image Compression (LIC) because it improves decoding consistency for interoperability and reduces space-time complexity for implementation.…

Image and Video Processing · Electrical Eng. & Systems 2023-10-10 Junqi Shi , Ming Lu , Zhan Ma

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF),…

Machine Learning · Statistics 2023-03-21 Alex Finkelstein , Ella Fuchs , Idan Tal , Mark Grobman , Niv Vosco , Eldad Meller

D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quantization (PTQ) is appealing, as it reduces memory…

Machine Learning · Computer Science 2026-02-09 Xianglong Yan , ChengZhu Bao , Zhiteng Li , Tianao Zhang , Shaoqiu Zhang , Ruobing Xie , Samm Sun , Yulun Zhang

Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction

Post-training quantization (PTQ) has evolved as a prominent solution for compressing complex models, which advocates a small calibration dataset and avoids end-to-end retraining. However, most existing PTQ methods employ block-wise…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Changjun Li , Runqing Jiang , Zhuo Song , Pengpeng Yu , Ye Zhang , Yulan Guo

Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric

Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Lin Niu , Jiawei Liu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

Diffusion models have shown remarkable performance in image synthesis by progressively estimating a smooth transition from a Gaussian distribution of noise to a real image. Unfortunately, their practical deployment is limited by slow…

Machine Learning · Computer Science 2026-03-03 Dung Anh Hoang , Cuong Pham anh Trung Le , Jianfei Cai , Thanh-Toan Do

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches. However, PTQ often leads to unacceptable performance degradation in quantized models, while QAT imposes…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Xinhao Wang , Zhiwei Lin , Zhongyu Xia , Yongtao Wang

Assessing the Potential for Catastrophic Failure in Dynamic Post-Training Quantization

Post-training quantization (PTQ) has recently emerged as an effective tool for reducing the computational complexity and memory usage of a neural network by representing its weights and activations with lower precision. While this paradigm…

Machine Learning · Computer Science 2025-10-06 Logan Frank , Paul Ardis

Post-Training Quantization for Video Matting

Video matting is crucial for applications such as film production and virtual reality, yet deploying its computationally intensive models on resource-constrained devices presents challenges. Quantization is a key technique for model…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Tianrui Zhu , Houyuan Chen , Ruihao Gong , Michele Magno , Haotong Qin , Kai Zhang

ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs). However, a systematic examination of various quantization schemes, model…

Machine Learning · Computer Science 2023-05-29 Zhewei Yao , Xiaoxia Wu , Cheng Li , Stephen Youn , Yuxiong He

Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models

Post-Training Quantization (PTQ) is pivotal for deploying large language models (LLMs) within resource-limited settings by significantly reducing resource demands. However, existing PTQ strategies underperform at low bit levels < 3 bits due…

Computation and Language · Computer Science 2025-01-31 Wanlong Liu , Yichen Xiao , Dingyi Zeng , Hongyang Zhao , Wenyu Chen , Malu Zhang

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

Large transformer models have demonstrated remarkable success. Post-training quantization (PTQ), which requires only a small dataset for calibration and avoids end-to-end retraining, is a promising solution for compressing these large…

Machine Learning · Computer Science 2024-02-09 Zhikai Li , Xuewen Liu , Jing Zhang , Qingyi Gu