Related papers: A Comprehensive Study on Quantization Techniques f…

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

Large language models (LLMs) have achieved remarkable advancements in natural language processing, showcasing exceptional performance across various tasks. However, the expensive memory and computational requirements present significant…

Artificial Intelligence · Computer Science 2025-11-13 Ruihao Gong , Yifu Ding , Zining Wang , Chengtao Lv , Xingyu Zheng , Jinyang Du , Haotong Qin , Jinyang Guo , Michele Magno , Xianglong Liu

On the Compressibility of Quantized Large Language Models

Deploying Large Language Models (LLMs) on edge or mobile devices offers significant benefits, such as enhanced data privacy and real-time processing capabilities. However, it also faces critical challenges due to the substantial memory…

Machine Learning · Computer Science 2024-05-07 Yu Mao , Weilan Wang , Hongchao Du , Nan Guan , Chun Jason Xue

Optimizing Large Language Models through Quantization: A Comparative Analysis of PTQ and QAT Techniques

This paper presents a comprehensive analysis of quantization techniques for optimizing Large Language Models (LLMs), specifically focusing on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Through empirical…

Machine Learning · Computer Science 2024-11-12 Jahid Hasan

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements…

Machine Learning · Computer Science 2024-10-10 Ruihao Gong , Yang Yong , Shiqiao Gu , Yushi Huang , Chengtao Lv , Yunchen Zhang , Xianglong Liu , Dacheng Tao

Binary Neural Networks for Large Language Model: A Survey

Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit…

Computation and Language · Computer Science 2025-02-27 Liangdong Liu , Zhitong Zheng , Cong Wang , Tianhuang Su , Zhenyu Yang

Towards Understanding Best Practices for Quantization of Vision-Language Models

Large language models (LLMs) deliver impressive results for a variety of tasks, but state-of-the-art systems require fast GPUs with large amounts of memory. To reduce both the memory and latency of these systems, practitioners quantize…

Computer Vision and Pattern Recognition · Computer Science 2026-01-22 Gautom Das , Vincent La , Ethan Lau , Abhinav Shrivastava , Matthew Gwilliam

Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and…

Machine Learning · Computer Science 2024-06-21 Yijun Liu , Yuan Meng , Fang Wu , Shenhao Peng , Hang Yao , Chaoyu Guan , Chen Tang , Xinzhu Ma , Zhi Wang , Wenwu Zhu

A Survey on Model Compression for Large Language Models

Large Language Models (LLMs) have transformed natural language processing tasks successfully. Yet, their large size and high computational needs pose challenges for practical use, especially in resource-limited settings. Model compression…

Computation and Language · Computer Science 2024-07-31 Xunyu Zhu , Jian Li , Yong Liu , Can Ma , Weiping Wang

Mixed-Precision Quantization for Language Models: Techniques and Prospects

The rapid scaling of language models (LMs) has resulted in unprecedented computational, memory, and energy requirements, making their training and deployment increasingly unsustainable. Quantization has emerged as an essential compression…

Machine Learning · Computer Science 2025-10-21 Mariam Rakka , Marios Fournarakis , Olga Krestinskaya , Jinane Bazzi , Khaled N. Salama , Fadi Kurdahi , Ahmed M. Eltawil , Mohammed E. Fouda

Foundations of Large Language Model Compression -- Part 1: Weight Quantization

In recent years, compression of large language models (LLMs) has emerged as an important problem to enable language model deployment on resource-constrained devices, reduce computational costs, and mitigate the environmental footprint of…

Machine Learning · Computer Science 2024-10-04 Sean I. Young

LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi

Deploying Large Language Models (LLMs) on resource-constrained edge devices like the Raspberry Pi presents challenges in computational efficiency, power consumption, and response latency. This paper explores quantization-based optimization…

Machine Learning · Computer Science 2025-04-04 Mahsa Ardakani , Jinendra Malekar , Ramtin Zand

Radio: Rate-Distortion Optimization for Large Language Model Compression

In recent years, the compression of large language models (LLMs) has emerged as a key problem in facilitating LLM deployment on resource-limited devices, reducing compute costs, and mitigating the environmental footprint due to large-scale…

Machine Learning · Computer Science 2025-05-07 Sean I. Young

A Comprehensive Evaluation on Quantization Techniques for Large Language Models

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu

Resource-Efficient Language Models: Quantization for Fast and Accessible Inference

Large language models have significantly advanced natural language processing, yet their heavy resource demands pose severe challenges regarding hardware accessibility and energy consumption. This paper presents a focused and high-level…

Artificial Intelligence · Computer Science 2025-05-14 Tollef Emil Jørgensen

Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview…

Machine Learning · Computer Science 2025-09-03 Sanjay Surendranath Girija , Shashank Kapoor , Lakshit Arora , Dipen Pradhan , Aman Raj , Ankit Shetgaonkar

LLM Compression: How Far Can We Go in Balancing Size and Performance?

Quantization is an essential and popular technique for improving the accessibility of large language models (LLMs) by reducing memory usage and computational costs while maintaining performance. In this study, we apply 4-bit Group Scaling…

Computation and Language · Computer Science 2025-08-18 Sahil Sk , Debasish Dhal , Sonal Khosla , Sk Shahid , Sambit Shekhar , Akash Dhaka , Shantipriya Parida , Dilip K. Prasad , Ondřej Bojar

What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be…

Machine Learning · Computer Science 2024-03-12 Zhuocheng Gong , Jiahao Liu , Jingang Wang , Xunliang Cai , Dongyan Zhao , Rui Yan

Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective

Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for efficient serving. While many…

Performance · Computer Science 2025-08-26 Tianyao Shi , Yi Ding

Is Quantization a Deal-breaker? Empirical Insights from Large Code Models

The growing scale of large language models (LLMs) not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that…

Software Engineering · Computer Science 2025-07-15 Saima Afrin , Bowen Xu , Antonio Mastropaolo