Related papers: FlexRound: Learnable Rounding based on Element-wis…

TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction

Large language models (LLMs) have revolutionized natural language processing, albeit at the cost of immense memory and computation requirements. Post-training quantization (PTQ) is becoming the de facto method to reduce the memory footprint…

Machine Learning · Computer Science 2024-10-28 Yuhang Li , Priyadarshini Panda

Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction

Post-training quantization (PTQ) has evolved as a prominent solution for compressing complex models, which advocates a small calibration dataset and avoids end-to-end retraining. However, most existing PTQ methods employ block-wise…

Computer Vision and Pattern Recognition · Computer Science 2025-05-02 Changjun Li , Runqing Jiang , Zhuo Song , Pengpeng Yu , Ye Zhang , Yulan Guo

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

We study the challenging task of neural network quantization without end-to-end retraining, called Post-training Quantization (PTQ). PTQ usually requires a small subset of training data but produces less powerful quantized models than…

Machine Learning · Computer Science 2021-07-27 Yuhang Li , Ruihao Gong , Xu Tan , Yang Yang , Peng Hu , Qi Zhang , Fengwei Yu , Wei Wang , Shi Gu

Post-Training Piecewise Linear Quantization for Deep Neural Networks

Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training…

Computer Vision and Pattern Recognition · Computer Science 2020-03-20 Jun Fang , Ali Shafiee , Hamzah Abdel-Aziz , David Thorsley , Georgios Georgiadis , Joseph Hassoun

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

Rotation-based Post-Training Quantization (PTQ) has emerged as a promising solution for mitigating activation outliers in the quantization of Large Language Models (LLMs). Global rotation methods achieve inference efficiency by fusing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Suyoung Kim , Sunghyun Wee , Hyeonjin Kim , Kyomin Hwang , Hyunho Lee , Nojun Kwak

EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization

Quantization is a key method for deploying deep neural networks on edge devices with limited memory and computation resources. Recent improvements in Post-Training Quantization (PTQ) methods were achieved by an additional local optimization…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Ofir Gordon , Elad Cohen , Hai Victor Habi , Arnon Netzer

LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization…

Machine Learning · Computer Science 2025-02-11 Jung Hyun Lee , Jeonghoon Kim , June Yong Yang , Se Jung Kwon , Eunho Yang , Kang Min Yoo , Dongsoo Lee

FlexQ: Efficient Post-training INT6 Quantization for LLM Serving via Algorithm-System Co-Design

Large Language Models (LLMs) demonstrate exceptional performance but entail significant memory and computational costs, restricting their practical deployment. While existing INT4/INT8 quantization reduces these costs, they often degrade…

Machine Learning · Computer Science 2025-11-04 Hao Zhang , Aining Jia , Weifeng Bu , Yushu Cai , Kai Sheng , Hao Chen , Xin He

Towards Efficient Post-training Quantization of Pre-trained Language Models

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end…

Computation and Language · Computer Science 2021-10-01 Haoli Bai , Lu Hou , Lifeng Shang , Xin Jiang , Irwin King , Michael R. Lyu

Iteratively Training Look-Up Tables for Network Quantization

Operating deep neural networks on devices with limited resources requires the reduction of their memory footprints and computational requirements. In this paper we introduce a training method, called look-up table quantization, LUT-Q, which…

Machine Learning · Computer Science 2018-11-14 Fabien Cardinaux , Stefan Uhlich , Kazuki Yoshiyama , Javier Alonso García , Stephen Tiedemann , Thomas Kemp , Akira Nakamura

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

Large transformer models have demonstrated remarkable success. Post-training quantization (PTQ), which requires only a small dataset for calibration and avoids end-to-end retraining, is a promising solution for compressing these large…

Machine Learning · Computer Science 2024-02-09 Zhikai Li , Xuewen Liu , Jing Zhang , Qingyi Gu

FLRQ: Faster LLM Quantization with Flexible Low-Rank Matrix Sketching

Traditional post-training quantization (PTQ) is considered an effective approach to reduce model size and accelerate inference of large-scale language models (LLMs). However, existing low-rank PTQ methods require costly fine-tuning to…

Machine Learning · Computer Science 2026-01-12 Hongyaoxing Gul , Lijuan Hu , Shuzi Niu , Fangfang Liu

A Comprehensive Evaluation on Quantization Techniques for Large Language Models

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile and TVs. Existing PTQ schemes, however, consume…

Machine Learning · Computer Science 2024-11-06 Junhan Kim , Chungman Lee , Eulrang Cho , Kyungphil Park , Ho-young Kim , Joonyoung Kim , Yongkweon Jeon

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chen Lin , Zheyang Li , Bo Peng , Haoji Hu , Wenming Tan , Ye Ren , Shiliang Pu

LoopQ: Quantization for Recursive Transformers

Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization…

Machine Learning · Computer Science 2026-05-19 Rui Fang , Hsi-Wen Chen , Ming-Syan Chen

PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models

Post-training quantization (PTQ) of large language models (LLMs) to extremely low bit-widths remains challenging due to the fundamental trade-off between computational efficiency and representational capacity. While existing ultra-low-bit…

Machine Learning · Computer Science 2026-01-05 He Xiao , Runming Yang , Qingyao Yang , Wendong Xu , Zhen Li , Yupeng Su , Zhengwu Liu , Hongxia Yang , Ngai Wong

Attention Round for Post-Training Quantization

At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the…

Machine Learning · Computer Science 2022-07-08 Huabin Diao , Gongyan Li , Shaoyun Xu , Yuexing Hao

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized…

Machine Learning · Computer Science 2023-01-18 Jinjie Zhang , Yixuan Zhou , Rayan Saab