English
Related papers

Related papers: A Comprehensive Study on Quantization Techniques f…

200 papers

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Large language models (LLMs) have achieved remarkable advancements in natural language processing, showcasing exceptional performance across various tasks. However, the expensive memory and computational requirements present significant…

Artificial Intelligence · Computer Science 2025-11-13 Ruihao Gong , Yifu Ding , Zining Wang , Chengtao Lv , Xingyu Zheng , Jinyang Du , Haotong Qin , Jinyang Guo , Michele Magno , Xianglong Liu

Deploying Large Language Models (LLMs) on edge or mobile devices offers significant benefits, such as enhanced data privacy and real-time processing capabilities. However, it also faces critical challenges due to the substantial memory…

Machine Learning · Computer Science 2024-05-07 Yu Mao , Weilan Wang , Hongchao Du , Nan Guan , Chun Jason Xue

This paper presents a comprehensive analysis of quantization techniques for optimizing Large Language Models (LLMs), specifically focusing on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Through empirical…

Machine Learning · Computer Science 2024-11-12 Jahid Hasan

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements…

Machine Learning · Computer Science 2024-10-10 Ruihao Gong , Yang Yong , Shiqiao Gu , Yushi Huang , Chengtao Lv , Yunchen Zhang , Xianglong Liu , Dacheng Tao

Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit…

Computation and Language · Computer Science 2025-02-27 Liangdong Liu , Zhitong Zheng , Cong Wang , Tianhuang Su , Zhenyu Yang

Large language models (LLMs) deliver impressive results for a variety of tasks, but state-of-the-art systems require fast GPUs with large amounts of memory. To reduce both the memory and latency of these systems, practitioners quantize…

Computer Vision and Pattern Recognition · Computer Science 2026-01-22 Gautom Das , Vincent La , Ethan Lau , Abhinav Shrivastava , Matthew Gwilliam

Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and…

Machine Learning · Computer Science 2024-06-21 Yijun Liu , Yuan Meng , Fang Wu , Shenhao Peng , Hang Yao , Chaoyu Guan , Chen Tang , Xinzhu Ma , Zhi Wang , Wenwu Zhu

Large Language Models (LLMs) have transformed natural language processing tasks successfully. Yet, their large size and high computational needs pose challenges for practical use, especially in resource-limited settings. Model compression…

Computation and Language · Computer Science 2024-07-31 Xunyu Zhu , Jian Li , Yong Liu , Can Ma , Weiping Wang

The rapid scaling of language models (LMs) has resulted in unprecedented computational, memory, and energy requirements, making their training and deployment increasingly unsustainable. Quantization has emerged as an essential compression…

In recent years, compression of large language models (LLMs) has emerged as an important problem to enable language model deployment on resource-constrained devices, reduce computational costs, and mitigate the environmental footprint of…

Machine Learning · Computer Science 2024-10-04 Sean I. Young

Deploying Large Language Models (LLMs) on resource-constrained edge devices like the Raspberry Pi presents challenges in computational efficiency, power consumption, and response latency. This paper explores quantization-based optimization…

Machine Learning · Computer Science 2025-04-04 Mahsa Ardakani , Jinendra Malekar , Ramtin Zand

In recent years, the compression of large language models (LLMs) has emerged as a key problem in facilitating LLM deployment on resource-limited devices, reducing compute costs, and mitigating the environmental footprint due to large-scale…

Machine Learning · Computer Science 2025-05-07 Sean I. Young

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu

Large language models have significantly advanced natural language processing, yet their heavy resource demands pose severe challenges regarding hardware accessibility and energy consumption. This paper presents a focused and high-level…

Artificial Intelligence · Computer Science 2025-05-14 Tollef Emil Jørgensen

Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview…

Machine Learning · Computer Science 2025-09-03 Sanjay Surendranath Girija , Shashank Kapoor , Lakshit Arora , Dipen Pradhan , Aman Raj , Ankit Shetgaonkar

Quantization is an essential and popular technique for improving the accessibility of large language models (LLMs) by reducing memory usage and computational costs while maintaining performance. In this study, we apply 4-bit Group Scaling…

Computation and Language · Computer Science 2025-08-18 Sahil Sk , Debasish Dhal , Sonal Khosla , Sk Shahid , Sambit Shekhar , Akash Dhaka , Shantipriya Parida , Dilip K. Prasad , Ondřej Bojar

Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be…

Machine Learning · Computer Science 2024-03-12 Zhuocheng Gong , Jiahao Liu , Jingang Wang , Xunliang Cai , Dongyan Zhao , Rui Yan

Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for efficient serving. While many…

Performance · Computer Science 2025-08-26 Tianyao Shi , Yi Ding

The growing scale of large language models (LLMs) not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that…

Software Engineering · Computer Science 2025-07-15 Saima Afrin , Bowen Xu , Antonio Mastropaolo
‹ Prev 1 2 3 10 Next ›