Related papers: Quantization Error Propagation: Revisiting Layer-W…

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement…

Machine Learning · Computer Science 2026-02-04 Yinggan Xu , Risto Miikkulainen , Xin Qiu

Towards Efficient Post-training Quantization of Pre-trained Language Models

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end…

Computation and Language · Computer Science 2021-10-01 Haoli Bai , Lu Hou , Lifeng Shang , Xin Jiang , Irwin King , Michael R. Lyu

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization

Post-Training Quantization (PTQ) has emerged as an effective technique for alleviating the substantial computational and memory overheads of Vision-Language Models (VLMs) by compressing both weights and activations without retraining the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Chenwei Jia , Baoting Li , Xuchong Zhang , Mingzhuo Wei , Bochen Lin , Hongbin Sun

SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited…

Computation and Language · Computer Science 2026-04-14 Han Liu , Haotian Gao , Xiaotong Zhang , Changya Li , Feng Zhang , Wei Wang , Fenglong Ma , Hong Yu

Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models

Large language models (LLMs) have significantly advanced natural language processing, but their massive parameter counts create substantial computational and memory challenges during deployment. Post-training quantization (PTQ) has emerged…

Machine Learning · Computer Science 2025-11-25 Cuong Pham , Hoang Anh Dung , Cuong C. Nguyen , Trung Le , Gustavo Carneiro , Thanh-Toan Do

Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER…

Machine Learning · Computer Science 2024-05-31 Cheng Zhang , Jianyi Cheng , George A. Constantinides , Yiren Zhao

PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models

Post-training quantization (PTQ) of large language models (LLMs) to extremely low bit-widths remains challenging due to the fundamental trade-off between computational efficiency and representational capacity. While existing ultra-low-bit…

Machine Learning · Computer Science 2026-01-05 He Xiao , Runming Yang , Qingyao Yang , Wendong Xu , Zhen Li , Yupeng Su , Zhengwu Liu , Hongxia Yang , Ngai Wong

LoopQ: Quantization for Recursive Transformers

Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization…

Machine Learning · Computer Science 2026-05-19 Rui Fang , Hsi-Wen Chen , Ming-Syan Chen

Quantization Meets Reasoning: Exploring and Mitigating Degradation of Low-Bit LLMs in Mathematical Reasoning

Low-bit post-training quantization (PTQ) is a practical route to deploy reasoning-capable LLMs under tight memory and latency budgets, yet it can markedly impair mathematical reasoning (drops up to 69.81% in our harder settings). We address…

Machine Learning · Computer Science 2026-01-21 Zhen Li , Yupeng Su , Songmiao Wang , Runming Yang , Congkai Xie , Aofan Liu , Ming Li , Jiannong Cao , Yuan Xie , Ngai Wong , Hongxia Yang

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

As the size of large language models (LLMs) continues to grow, model compression without sacrificing accuracy has become a crucial challenge for deployment. While some quantization methods, such as GPTQ, have made progress in achieving…

Machine Learning · Computer Science 2023-12-14 Liang Li , Qingyuan Li , Bo Zhang , Xiangxiang Chu

Rethinking Practical and Efficient Quantization Calibration for Vision-Language Models

Post-training quantization (PTQ) is a primary approach for deploying large language models without fine-tuning, and the quantized performance is often strongly affected by the calibration in PTQ. By contrast, in vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Zhenhao Shang , Haizhao Jing , Guoting Wei , Haokui Zhang , Rong Xiao , Jianqing Gao , Peng Wang

FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization

Although post-training quantization (PTQ) provides an efficient numerical compression scheme for deploying large language models (LLMs) on resource-constrained devices, the representativeness and universality of calibration data remain a…

Machine Learning · Computer Science 2026-01-19 Haiyang Xiao , Weiqing Li , Jinyue Guo , Guochao Jiang , Guohua Liu , Yuewei Zhang

QERA: an Analytical Framework for Quantization Error Reconstruction

The growing number of parameters and computational demands of large language models (LLMs) present significant challenges for their efficient deployment. Recently, there is an increasing interest in quantizing weights to extremely low…

Machine Learning · Computer Science 2025-02-18 Cheng Zhang , Jeffrey T. H. Wong , Can Xiao , George A. Constantinides , Yiren Zhao

Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners

The quantization of large language models (LLMs) has been a prominent research area aimed at enabling their lightweight deployment in practice. Existing research about LLM's quantization has mainly explored the interplay between weights and…

Computation and Language · Computer Science 2025-05-16 Yifei Gao , Jie Ou , Lei Wang , Jun Cheng , Mengchu Zhou

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an…

Machine Learning · Computer Science 2025-08-07 Jiaqi Zhao , Miao Zhang , Ming Wang , Yuzhang Shang , Kaihao Zhang , Weili Guan , Yaowei Wang , Min Zhang

Error Propagation Mechanisms and Compensation Strategies for Quantized Diffusion

Diffusion models have transformed image synthesis by establishing unprecedented quality and creativity benchmarks. Nevertheless, their large-scale deployment faces challenges due to computationally intensive iterative denoising processes.…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Songwei Liu , Chao Zeng , Chenqian Yan , Xurui Peng , Xing Wang , Fangmin Chen , Xing Mei

Optimizing Large Language Models through Quantization: A Comparative Analysis of PTQ and QAT Techniques

This paper presents a comprehensive analysis of quantization techniques for optimizing Large Language Models (LLMs), specifically focusing on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Through empirical…

Machine Learning · Computer Science 2024-11-12 Jahid Hasan

CoreQ: Learning-Free Mismatch Correction and Successive Rounding for Quantization

Post-training quantization (PTQ) enables efficient deployment of large language models by mapping pretrained weights to low-bit formats without retraining, typically using a small calibration set to minimize a layer-wise calibration…

Machine Learning · Computer Science 2026-05-12 Seohyeon Cha , Huancheng Chen , Dongjun Kim , Haoran Zhang , Kevin Chan , Gustavo de Veciana , Haris Vikalo

A Comprehensive Evaluation on Quantization Techniques for Large Language Models

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu