Related papers: QDrop: Randomly Dropping Quantization for Extremel…

Quantization Meets Reasoning: Exploring and Mitigating Degradation of Low-Bit LLMs in Mathematical Reasoning

Low-bit post-training quantization (PTQ) is a practical route to deploy reasoning-capable LLMs under tight memory and latency budgets, yet it can markedly impair mathematical reasoning (drops up to 69.81% in our harder settings). We address…

Machine Learning · Computer Science 2026-01-21 Zhen Li , Yupeng Su , Songmiao Wang , Runming Yang , Congkai Xie , Aofan Liu , Ming Li , Jiannong Cao , Yuan Xie , Ngai Wong , Hongxia Yang

Assessing the Potential for Catastrophic Failure in Dynamic Post-Training Quantization

Post-training quantization (PTQ) has recently emerged as an effective tool for reducing the computational complexity and memory usage of a neural network by representing its weights and activations with lower precision. While this paradigm…

Machine Learning · Computer Science 2025-10-06 Logan Frank , Paul Ardis

What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study

Reasoning models excel at complex tasks such as coding and mathematics, yet their inference is often slow and token-inefficient. To improve the inference efficiency, post-training quantization (PTQ) usually comes with the cost of large…

Machine Learning · Computer Science 2026-01-22 Keyu Lv , Manyi Zhang , Xiaobo Xia , Jingchen Ni , Shannan Yan , Xianzhi Yu , Lu Hou , Chun Yuan , Haoli Bai

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

A White Paper on Neural Network Quantization

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF),…

Machine Learning · Statistics 2023-03-21 Alex Finkelstein , Ella Fuchs , Idan Tal , Mark Grobman , Niv Vosco , Eldad Meller

Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss

Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged for their deployments to resource-limited devices. Although recent studies have successfully discretized a full-precision…

Machine Learning · Computer Science 2021-09-07 Jung Hyun Lee , Jihun Yun , Sung Ju Hwang , Eunho Yang

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods…

Machine Learning · Computer Science 2023-03-24 Zhihang Yuan , Jiawei Liu , Jiaxiang Wu , Dawei Yang , Qiang Wu , Guangyu Sun , Wenyu Liu , Xinggang Wang , Bingzhe Wu

Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric

Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Lin Niu , Jiawei Liu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models

Post-training quantization (PTQ) of large language models (LLMs) to extremely low bit-widths remains challenging due to the fundamental trade-off between computational efficiency and representational capacity. While existing ultra-low-bit…

Machine Learning · Computer Science 2026-01-05 He Xiao , Runming Yang , Qingyao Yang , Wendong Xu , Zhen Li , Yupeng Su , Zhengwu Liu , Hongxia Yang , Ngai Wong

ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs). However, a systematic examination of various quantization schemes, model…

Machine Learning · Computer Science 2023-05-29 Zhewei Yao , Xiaoxia Wu , Cheng Li , Stephen Youn , Yuxiong He

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network that meets hardware requirements and does not call for long-time retraining. Power-of-Two quantization can convert the multiplication…

Computer Vision and Pattern Recognition · Computer Science 2022-09-27 Hongyi Yao , Pu Li , Jian Cao , Xiangcheng Liu , Chenying Xie , Bingzhang Wang

Efficiently Training A Flat Neural Network Before It has been Quantizated

Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the relationship between a well-trained NN and the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Peng Xia , Junbiao Pang , Tianyang Cai

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for real-world applications is constrained by substantial computational costs and latency issues.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Yefei He , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

PTQD: Accurate Post-Training Quantization for Diffusion Models

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Yefei He , Luping Liu , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Q-Diffusion: Quantizing Diffusion Models

Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Xiuyu Li , Yijiang Liu , Long Lian , Huanrui Yang , Zhen Dong , Daniel Kang , Shanghang Zhang , Kurt Keutzer

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

We study the challenging task of neural network quantization without end-to-end retraining, called Post-training Quantization (PTQ). PTQ usually requires a small subset of training data but produces less powerful quantized models than…

Machine Learning · Computer Science 2021-07-27 Yuhang Li , Ruihao Gong , Xu Tan , Yang Yang , Peng Hu , Qi Zhang , Fengwei Yu , Wei Wang , Shi Gu

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Post-Training Quantization in Brain-Computer Interfaces based on Event-Related Potential Detection

Post-training quantization (PTQ) is a technique used to optimize and reduce the memory footprint and computational requirements of machine learning models. It has been used primarily for neural networks. For Brain-Computer Interfaces (BCI)…

Human-Computer Interaction · Computer Science 2024-10-11 Hubert Cecotti , Dalvir Dhaliwal , Hardip Singh , Yogesh Kumar Meena

MetaAug: Meta-Data Augmentation for Post-Training Quantization

Post-Training Quantization (PTQ) has received significant attention because it requires only a small set of calibration data to quantize a full-precision model, which is more practical in real-world applications in which full access to a…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Cuong Pham , Hoang Anh Dung , Cuong C. Nguyen , Trung Le , Dinh Phung , Gustavo Carneiro , Thanh-Toan Do