English
Related papers

Related papers: Evaluating Quantized Large Language Models

200 papers

Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs). However, a systematic examination of various quantization schemes, model…

Machine Learning · Computer Science 2023-05-29 Zhewei Yao , Xiaoxia Wu , Cheng Li , Stephen Youn , Yuxiong He

Large language models (LLMs) have transformed natural language processing but pose significant challenges for real-world deployment. These models necessitate considerable computing resources, which can be costly and frequently unavailable.…

Computation and Language · Computer Science 2025-02-17 Xiliang Zhu , Elena Khasanova , Cheng Chen

Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of…

Machine Learning · Computer Science 2024-10-17 Sayeh Sharify , Utkarsh Saxena , Zifei Xu , Wanzin Yazar , Ilya Soloveychik , Xin Wang

This paper presents a comprehensive analysis of quantization techniques for optimizing Large Language Models (LLMs), specifically focusing on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Through empirical…

Machine Learning · Computer Science 2024-11-12 Jahid Hasan

Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited…

Computation and Language · Computer Science 2026-04-14 Han Liu , Haotian Gao , Xiaotong Zhang , Changya Li , Feng Zhang , Wei Wang , Fenglong Ma , Hong Yu

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

Quantization is essential for deploying large language models (LLMs) on resource-constrained hardware, but its implications for multilingual tasks remain underexplored. We conduct the first large-scale evaluation of post-training…

Computation and Language · Computer Science 2025-08-29 Benjamin Marie , Atsushi Fujita

Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression owing to its efficiency and low resource requirement. However, current research lacks a in-depth analysis of the superior…

Machine Learning · Computer Science 2025-05-22 Jiaqi Zhao , Ming Wang , Miao Zhang , Yuzhang Shang , Xuebo Liu , Yaowei Wang , Min Zhang , Liqiang Nie

Large language models have significantly advanced natural language processing, yet their heavy resource demands pose severe challenges regarding hardware accessibility and energy consumption. This paper presents a focused and high-level…

Artificial Intelligence · Computer Science 2025-05-14 Tollef Emil Jørgensen

With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization…

Machine Learning · Computer Science 2025-02-11 Jung Hyun Lee , Jeonghoon Kim , June Yong Yang , Se Jung Kwon , Eunho Yang , Kang Min Yoo , Dongsoo Lee

Improving the efficiency of inference in Large Language Models (LLMs) is a critical area of research. Post-training Quantization (PTQ) is a popular technique, but it often faces challenges at low-bit levels, particularly in downstream…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Wenjin Ke , Zhe Li , Dong Li , Lu Tian , Emad Barsoum

Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed…

Machine Learning · Computer Science 2024-07-04 Yipin Guo , Yilin Lang , Qinyuan Ren

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end…

Computation and Language · Computer Science 2021-10-01 Haoli Bai , Lu Hou , Lifeng Shang , Xin Jiang , Irwin King , Michael R. Lyu

Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit…

Computation and Language · Computer Science 2025-02-27 Liangdong Liu , Zhitong Zheng , Cong Wang , Tianhuang Su , Zhenyu Yang

As Large Language Models (LLMs) become increasingly computationally complex, developing efficient deployment strategies, such as quantization, becomes crucial. State-of-the-art Post-training Quantization (PTQ) techniques often rely on…

Machine Learning · Computer Science 2025-01-17 Alireza Ghaffari , Sharareh Younesian , Boxing Chen , Vahid Partovi Nia , Masoud Asgharian

Quantization is an essential and popular technique for improving the accessibility of large language models (LLMs) by reducing memory usage and computational costs while maintaining performance. In this study, we apply 4-bit Group Scaling…

Computation and Language · Computer Science 2025-08-18 Sahil Sk , Debasish Dhal , Sonal Khosla , Sk Shahid , Sambit Shekhar , Akash Dhaka , Shantipriya Parida , Dilip K. Prasad , Ondřej Bojar

Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on…

Machine Learning · Computer Science 2024-02-21 Yuxuan Yue , Zhihang Yuan , Haojie Duanmu , Sifan Zhou , Jianlong Wu , Liqiang Nie

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Post-training quantization (PTQ) of large language models (LLMs) to extremely low bit-widths remains challenging due to the fundamental trade-off between computational efficiency and representational capacity. While existing ultra-low-bit…

Machine Learning · Computer Science 2026-01-05 He Xiao , Runming Yang , Qingyao Yang , Wendong Xu , Zhen Li , Yupeng Su , Zhengwu Liu , Hongxia Yang , Ngai Wong
‹ Prev 1 2 3 10 Next ›