English
Related papers

Related papers: QTIP: Quantization with Trellises and Incoherence …

200 papers

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression…

Machine Learning · Computer Science 2024-06-05 Albert Tseng , Jerry Chee , Qingyao Sun , Volodymyr Kuleshov , Christopher De Sa

Post-training quantization (PTQ) of large language models (LLMs) to extremely low bit-widths remains challenging due to the fundamental trade-off between computational efficiency and representational capacity. While existing ultra-low-bit…

Machine Learning · Computer Science 2026-01-05 He Xiao , Runming Yang , Qingyao Yang , Wendong Xu , Zhen Li , Yupeng Su , Zhengwu Liu , Hongxia Yang , Ngai Wong

This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from $\textit{incoherent}$…

Machine Learning · Computer Science 2024-01-17 Jerry Chee , Yaohui Cai , Volodymyr Kuleshov , Christopher De Sa

We study weight-only post-training quantization (PTQ), which quantizes the weights of a large language model (LLM) without retraining, using little or no calibration data. Weight-only PTQ is crucial for reducing the memory footprint and…

Machine Learning · Computer Science 2025-10-23 Deokjae Lee , Hyun Oh Song

Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited…

Computation and Language · Computer Science 2026-04-14 Han Liu , Haotian Gao , Xiaotong Zhang , Changya Li , Feng Zhang , Wei Wang , Fenglong Ma , Hong Yu

Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed…

Machine Learning · Computer Science 2024-07-04 Yipin Guo , Yilin Lang , Qinyuan Ren

Scaling model size significantly challenges the deployment and inference of Large Language Models (LLMs). Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down…

Artificial Intelligence · Computer Science 2024-10-23 Yifei Liu , Jicheng Wen , Yang Wang , Shengyu Ye , Li Lyna Zhang , Ting Cao , Cheng Li , Mao Yang

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality. We propose the GPTVQ method, a new fast method for post-training vector…

Post-training quantization (PTQ) is a primary approach for deploying large language models without fine-tuning, and the quantized performance is often strongly affected by the calibration in PTQ. By contrast, in vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Zhenhao Shang , Haizhao Jing , Guoting Wei , Haokui Zhang , Rong Xiao , Jianqing Gao , Peng Wang

Trellis-coded quantization sets the current 2-bit post-training frontier for LLMs (QTIP), but pushing below the PTQ ceiling requires quantization-aware training, and QAT on a trellis is obstructed by the non-differentiable Viterbi argmax.…

Machine Learning · Computer Science 2026-05-12 Venugopalan Iyengar

Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an…

Machine Learning · Computer Science 2025-08-07 Jiaqi Zhao , Miao Zhang , Ming Wang , Yuzhang Shang , Kaihao Zhang , Weili Guan , Yaowei Wang , Min Zhang

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu

Large language models have significantly advanced natural language processing, yet their heavy resource demands pose severe challenges regarding hardware accessibility and energy consumption. This paper presents a focused and high-level…

Artificial Intelligence · Computer Science 2025-05-14 Tollef Emil Jørgensen

Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs). However, a systematic examination of various quantization schemes, model…

Machine Learning · Computer Science 2023-05-29 Zhewei Yao , Xiaoxia Wu , Cheng Li , Stephen Youn , Yuxiong He

With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization…

Machine Learning · Computer Science 2025-02-11 Jung Hyun Lee , Jeonghoon Kim , June Yong Yang , Se Jung Kwon , Eunho Yang , Kang Min Yoo , Dongsoo Lee

Post-training quantization (PTQ) is a promising approach to reducing the storage and computational requirements of large language models (LLMs) without additional training cost. Recent PTQ studies have primarily focused on quantizing only…

Machine Learning · Computer Science 2026-02-17 Reena Elangovan , Charbel Sakr , Anand Raghunathan , Brucek Khailany

Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of…

Machine Learning · Computer Science 2024-10-17 Sayeh Sharify , Utkarsh Saxena , Zifei Xu , Wanzin Yazar , Ilya Soloveychik , Xin Wang

Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the…

Computation and Language · Computer Science 2024-06-07 Shiyao Li , Xuefei Ning , Luning Wang , Tengxuan Liu , Xiangsheng Shi , Shengen Yan , Guohao Dai , Huazhong Yang , Yu Wang

Post-training quantization (PTQ) has emerged as a promising solution for reducing the storage and computational cost of vision transformers (ViTs). Recent advances primarily target at crafting quantizers to deal with peculiar activations…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Runqing Jiang , Ye Zhang , Longguang Wang , Pengpeng Yu , Yulan Guo
‹ Prev 1 2 3 10 Next ›