Related papers: PTQ-SL: Exploring the Sub-layerwise Post-training …

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chen Lin , Zheyang Li , Bo Peng , Haoji Hu , Wenming Tan , Ye Ren , Shiliang Pu

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Post-Training Piecewise Linear Quantization for Deep Neural Networks

Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training…

Computer Vision and Pattern Recognition · Computer Science 2020-03-20 Jun Fang , Ali Shafiee , Hamzah Abdel-Aziz , David Thorsley , Georgios Georgiadis , Joseph Hassoun

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis

Quantization is an essential step in the efficient deployment of deep learning models and as such is an increasingly popular research topic. An important practical aspect that is not addressed in the current literature is how to analyze and…

Machine Learning · Computer Science 2020-12-16 Shachar Gluska , Mark Grobman

Deep Neural Network Compression with Single and Multiple Level Quantization

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this…

Machine Learning · Computer Science 2018-12-18 Yuhui Xu , Yongzhuang Wang , Aojun Zhou , Weiyao Lin , Hongkai Xiong

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized…

Machine Learning · Computer Science 2023-01-18 Jinjie Zhang , Yixuan Zhou , Rayan Saab

RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures

Existing neural networks are memory-consuming and computationally intensive, making deploying them challenging in resource-constrained environments. However, there are various methods to improve their efficiency. Two such methods are…

Machine Learning · Computer Science 2023-11-10 Anastasiia Prutianova , Alexey Zaytsev , Chung-Kuei Lee , Fengyu Sun , Ivan Koryakovskiy

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods…

Machine Learning · Computer Science 2023-03-24 Zhihang Yuan , Jiawei Liu , Jiaxiang Wu , Dawei Yang , Qiang Wu , Guangyu Sun , Wenyu Liu , Xinggang Wang , Bingzhe Wu

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

Although neural networks have made remarkable advancements in various applications, they require substantial computational and memory resources. Network quantization is a powerful technique to compress neural networks, allowing for more…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Dawei Yang , Ning He , Xing Hu , Zhihang Yuan , Jiangyong Yu , Chen Xu , Zhe Jiang

Post-training 4-bit quantization of convolution networks for rapid-deployment

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

A Comprehensive Evaluation on Quantization Techniques for Large Language Models

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu

Attention Round for Post-Training Quantization

At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the…

Machine Learning · Computer Science 2022-07-08 Huabin Diao , Gongyan Li , Shaoyun Xu , Yuexing Hao

Transform Quantization for CNN (Convolutional Neural Network) Compression

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN…

Computer Vision and Pattern Recognition · Computer Science 2021-11-09 Sean I. Young , Wang Zhe , David Taubman , Bernd Girod

Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction

Post-training quantization (PTQ) has evolved as a prominent solution for compressing complex models, which advocates a small calibration dataset and avoids end-to-end retraining. However, most existing PTQ methods employ block-wise…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Changjun Li , Runqing Jiang , Zhuo Song , Pengpeng Yu , Ye Zhang , Yulan Guo

SliderQuant: Accurate Post-Training Quantization for LLMs

In this paper, we address post-training quantization (PTQ) for large language models (LLMs) from an overlooked perspective: given a pre-trained high-precision LLM, the predominant sequential quantization framework treats different layers…

Artificial Intelligence · Computer Science 2026-03-27 Shigeng Wang , Chao Li , Yangyuxuan Kang , Jiawei Fan , Zhonghong Ou , Anbang Yao

PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhihang Yuan , Chenhao Xue , Yiqi Chen , Qiang Wu , Guangyu Sun

Interactions Across Blocks in Post-Training Quantization of Large Language Models

Post-training quantization is widely employed to reduce the computational demands of neural networks. Typically, individual substructures, such as layers or blocks of layers, are quantized with the objective of minimizing quantization…

Machine Learning · Computer Science 2024-11-07 Khasmamad Shabanovi , Lukas Wiest , Vladimir Golkov , Daniel Cremers , Thomas Pfeil

Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners

The quantization of large language models (LLMs) has been a prominent research area aimed at enabling their lightweight deployment in practice. Existing research about LLM's quantization has mainly explored the interplay between weights and…

Computation and Language · Computer Science 2025-05-16 Yifei Gao , Jie Ou , Lei Wang , Jun Cheng , Mengchu Zhou

Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models

Post-Training Quantization (PTQ) is pivotal for deploying large language models (LLMs) within resource-limited settings by significantly reducing resource demands. However, existing PTQ strategies underperform at low bit levels < 3 bits due…

Computation and Language · Computer Science 2025-01-31 Wanlong Liu , Yichen Xiao , Dingyi Zeng , Hongyang Zhao , Wenyu Chen , Malu Zhang