Related papers: LPCD: Unified Framework from Layer-Wise to Submodu…

LoaQ: Layer-wise Output Approximation Quantization

A natural and intuitive idea in model quantization is to approximate each component's quantized output to match its original. Motivated by this idea, most layer-wise post-training quantization (PTQ) methods focus on weight approximation at…

Machine Learning · Computer Science 2026-01-28 Li Lin , Xiaojun Wan

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Layer-wise PTQ is a promising technique for compressing large language models (LLMs), due to its simplicity and effectiveness without requiring retraining. However, recent progress in this area is saturating, underscoring the need to…

Machine Learning · Computer Science 2026-01-14 Yamato Arai , Yuma Ichikawa

Towards Efficient Post-training Quantization of Pre-trained Language Models

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end…

Computation and Language · Computer Science 2021-10-01 Haoli Bai , Lu Hou , Lifeng Shang , Xin Jiang , Irwin King , Michael R. Lyu

Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection

Due to highly constrained computing power and memory, deploying 3D lidar-based detectors on edge devices equipped in autonomous vehicles and robots poses a crucial challenge. Being a convenient and straightforward model compression…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Sifan Zhou , Liang Li , Xinyu Zhang , Bo Zhang , Shipeng Bai , Miao Sun , Ziyu Zhao , Xiaobo Lu , Xiangxiang Chu

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chen Lin , Zheyang Li , Bo Peng , Haoji Hu , Wenming Tan , Ye Ren , Shiliang Pu

QuantEase: Optimization-based Quantization for Language Models

With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing…

Machine Learning · Statistics 2023-12-04 Kayhan Behdin , Ayan Acharya , Aman Gupta , Qingquan Song , Siyu Zhu , Sathiya Keerthi , Rahul Mazumder

Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

Low-bit post-training quantization (PTQ) is a pivotal technique for deploying Vision-Language Models (VLMs) on resource-constrained devices. However, existing PTQ methods often degrade VLMs' accuracy due to the heterogeneous activation…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Yi Zhong , Haotong Qin , Xindong Zhang , Lei Zhang , Guolei Sun

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited…

Computation and Language · Computer Science 2026-04-14 Han Liu , Haotian Gao , Xiaotong Zhang , Changya Li , Feng Zhang , Wei Wang , Fenglong Ma , Hong Yu

Efficient Quantization Strategies for Latent Diffusion Models

Latent Diffusion Models (LDMs) capture the dynamic evolution of latent variables over time, blending patterns and multimodality in a generative system. Despite the proficiency of LDM in various applications, such as text-to-image…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Yuewei Yang , Xiaoliang Dai , Jialiang Wang , Peizhao Zhang , Hongbo Zhang

Rethinking Practical and Efficient Quantization Calibration for Vision-Language Models

Post-training quantization (PTQ) is a primary approach for deploying large language models without fine-tuning, and the quantized performance is often strongly affected by the calibration in PTQ. By contrast, in vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Zhenhao Shang , Haizhao Jing , Guoting Wei , Haokui Zhang , Rong Xiao , Jianqing Gao , Peng Wang

CoreQ: Learning-Free Mismatch Correction and Successive Rounding for Quantization

Post-training quantization (PTQ) enables efficient deployment of large language models by mapping pretrained weights to low-bit formats without retraining, typically using a small calibration set to minimize a layer-wise calibration…

Machine Learning · Computer Science 2026-05-12 Seohyeon Cha , Huancheng Chen , Dongjun Kim , Haoran Zhang , Kevin Chan , Gustavo de Veciana , Haris Vikalo

SliderQuant: Accurate Post-Training Quantization for LLMs

In this paper, we address post-training quantization (PTQ) for large language models (LLMs) from an overlooked perspective: given a pre-trained high-precision LLM, the predominant sequential quantization framework treats different layers…

Artificial Intelligence · Computer Science 2026-03-27 Shigeng Wang , Chao Li , Yangyuxuan Kang , Jiawei Fan , Zhonghong Ou , Anbang Yao

Progressive Mixed-Precision Decoding for Efficient LLM Inference

In spite of the great potential of large language models (LLMs) across various tasks, their deployment on resource-constrained devices remains challenging due to their excessive computational and memory demands. Quantization has emerged as…

Machine Learning · Computer Science 2025-02-28 Hao Mark Chen , Fuwen Tan , Alexandros Kouris , Royson Lee , Hongxiang Fan , Stylianos I. Venieris

Saliency-Aware Regularized Quantization Calibration for Large Language Models

Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction…

Artificial Intelligence · Computer Science 2026-05-11 Yanlong Zhao , Xiaoyuan Cheng , Huihang Liu , Baihua He , Xinyu Zhang , Harrison Bo Hua Zhu , Wenlong Chen , Li Zeng , Zhuo Sun

PTQD: Accurate Post-Training Quantization for Diffusion Models

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Yefei He , Luping Liu , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Enhancing Post-Training Quantization via Future Activation Awareness

Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning. It typically sets quantization hyperparameters (e.g., scaling factors) based on current-layer activations. Although this…

Machine Learning · Computer Science 2026-02-04 Zheqi Lv , Zhenxuan Fan , Qi Tian , Wenqiao Zhang , Yueting Zhuang

PQD: Post-training Quantization for Efficient Diffusion Models

Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Jiaojiao Ye , Zhen Wang , Linnan Jiang

Rethinking Residual Errors in Compensation-based LLM Quantization

Methods based on weight compensation, which iteratively apply quantization and weight compensation to minimize the output error, have recently demonstrated remarkable success in quantizing Large Language Models (LLMs). The representative…

Machine Learning · Computer Science 2026-04-10 Shuaiting Li , Juncan Deng , Kedong Xu , Rongtao Deng , Hong Gu , Minghan Jiang , Haibin Shen , Kejie Huang