English
Related papers

Related papers: Quantization Error Propagation: Revisiting Layer-W…

200 papers

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement…

Machine Learning · Computer Science 2026-02-04 Yinggan Xu , Risto Miikkulainen , Xin Qiu

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end…

Computation and Language · Computer Science 2021-10-01 Haoli Bai , Lu Hou , Lifeng Shang , Xin Jiang , Irwin King , Michael R. Lyu

Post-Training Quantization (PTQ) has emerged as an effective technique for alleviating the substantial computational and memory overheads of Vision-Language Models (VLMs) by compressing both weights and activations without retraining the…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Chenwei Jia , Baoting Li , Xuchong Zhang , Mingzhuo Wei , Bochen Lin , Hongbin Sun

Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited…

Computation and Language · Computer Science 2026-04-14 Han Liu , Haotian Gao , Xiaotong Zhang , Changya Li , Feng Zhang , Wei Wang , Fenglong Ma , Hong Yu

Large language models (LLMs) have significantly advanced natural language processing, but their massive parameter counts create substantial computational and memory challenges during deployment. Post-training quantization (PTQ) has emerged…

Machine Learning · Computer Science 2025-11-25 Cuong Pham , Hoang Anh Dung , Cuong C. Nguyen , Trung Le , Gustavo Carneiro , Thanh-Toan Do

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER…

Machine Learning · Computer Science 2024-05-31 Cheng Zhang , Jianyi Cheng , George A. Constantinides , Yiren Zhao

Post-training quantization (PTQ) of large language models (LLMs) to extremely low bit-widths remains challenging due to the fundamental trade-off between computational efficiency and representational capacity. While existing ultra-low-bit…

Machine Learning · Computer Science 2026-01-05 He Xiao , Runming Yang , Qingyao Yang , Wendong Xu , Zhen Li , Yupeng Su , Zhengwu Liu , Hongxia Yang , Ngai Wong

Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization…

Machine Learning · Computer Science 2026-05-19 Rui Fang , Hsi-Wen Chen , Ming-Syan Chen

Low-bit post-training quantization (PTQ) is a practical route to deploy reasoning-capable LLMs under tight memory and latency budgets, yet it can markedly impair mathematical reasoning (drops up to 69.81% in our harder settings). We address…

Machine Learning · Computer Science 2026-01-21 Zhen Li , Yupeng Su , Songmiao Wang , Runming Yang , Congkai Xie , Aofan Liu , Ming Li , Jiannong Cao , Yuan Xie , Ngai Wong , Hongxia Yang

As the size of large language models (LLMs) continues to grow, model compression without sacrificing accuracy has become a crucial challenge for deployment. While some quantization methods, such as GPTQ, have made progress in achieving…

Machine Learning · Computer Science 2023-12-14 Liang Li , Qingyuan Li , Bo Zhang , Xiangxiang Chu

Post-training quantization (PTQ) is a primary approach for deploying large language models without fine-tuning, and the quantized performance is often strongly affected by the calibration in PTQ. By contrast, in vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Zhenhao Shang , Haizhao Jing , Guoting Wei , Haokui Zhang , Rong Xiao , Jianqing Gao , Peng Wang

Although post-training quantization (PTQ) provides an efficient numerical compression scheme for deploying large language models (LLMs) on resource-constrained devices, the representativeness and universality of calibration data remain a…

Machine Learning · Computer Science 2026-01-19 Haiyang Xiao , Weiqing Li , Jinyue Guo , Guochao Jiang , Guohua Liu , Yuewei Zhang

The growing number of parameters and computational demands of large language models (LLMs) present significant challenges for their efficient deployment. Recently, there is an increasing interest in quantizing weights to extremely low…

Machine Learning · Computer Science 2025-02-18 Cheng Zhang , Jeffrey T. H. Wong , Can Xiao , George A. Constantinides , Yiren Zhao

The quantization of large language models (LLMs) has been a prominent research area aimed at enabling their lightweight deployment in practice. Existing research about LLM's quantization has mainly explored the interplay between weights and…

Computation and Language · Computer Science 2025-05-16 Yifei Gao , Jie Ou , Lei Wang , Jun Cheng , Mengchu Zhou

Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an…

Machine Learning · Computer Science 2025-08-07 Jiaqi Zhao , Miao Zhang , Ming Wang , Yuzhang Shang , Kaihao Zhang , Weili Guan , Yaowei Wang , Min Zhang

Diffusion models have transformed image synthesis by establishing unprecedented quality and creativity benchmarks. Nevertheless, their large-scale deployment faces challenges due to computationally intensive iterative denoising processes.…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Songwei Liu , Chao Zeng , Chenqian Yan , Xurui Peng , Xing Wang , Fangmin Chen , Xing Mei

This paper presents a comprehensive analysis of quantization techniques for optimizing Large Language Models (LLMs), specifically focusing on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Through empirical…

Machine Learning · Computer Science 2024-11-12 Jahid Hasan

Post-training quantization (PTQ) enables efficient deployment of large language models by mapping pretrained weights to low-bit formats without retraining, typically using a small calibration set to minimize a layer-wise calibration…

Machine Learning · Computer Science 2026-05-12 Seohyeon Cha , Huancheng Chen , Dongjun Kim , Haoran Zhang , Kevin Chan , Gustavo de Veciana , Haris Vikalo

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu
‹ Prev 1 2 3 10 Next ›