English
Related papers

Related papers: Exploiting LLM Quantization

200 papers

LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may release a model that appears benign in full precision but exhibits…

Machine Learning · Computer Science 2026-05-15 Xiaohua Zhan , Kazuki Egashira , Robin Staab , Mark Vero , Martin Vechev

Large Language Models (LLMs) have been extensively researched and used in both academia and industry since the rise in popularity of the Transformer model, which demonstrates excellent performance in AI. However, the computational demands…

Machine Learning · Computer Science 2024-11-06 Jiedong Lang , Zhehao Guo , Shuyu Huang

Large language models of code exhibit high capability in performing diverse software engineering tasks, such as code translation, defect detection, text-to-code generation, and code summarization. While their ability to enhance developer…

Software Engineering · Computer Science 2025-05-21 Aftab Hussain , Sadegh AlMahdi Kazemi Zarkouei , Md Rafiqul Islam Rabin , Mohammad Amin Alipour , Sen Lin , Bowen Xu

The growing scale of large language models (LLMs) not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that…

Software Engineering · Computer Science 2025-07-15 Saima Afrin , Bowen Xu , Antonio Mastropaolo

With the increasing size of frontier LLMs, post-training quantization has become the standard for memory-efficient deployment. Recent work has shown that basic rounding-based quantization schemes pose security risks, as they can be…

Cryptography and Security · Computer Science 2025-06-05 Kazuki Egashira , Robin Staab , Mark Vero , Jingxuan He , Martin Vechev

Deploying Large Language Models (LLMs) on edge or mobile devices offers significant benefits, such as enhanced data privacy and real-time processing capabilities. However, it also faces critical challenges due to the substantial memory…

Machine Learning · Computer Science 2024-05-07 Yu Mao , Weilan Wang , Hongchao Du , Nan Guan , Chun Jason Xue

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Quantization enables efficient deployment of large language models (LLMs) in resource-constrained environments by significantly reducing memory and computation costs. While quantized LLMs often maintain performance on perplexity and…

Artificial Intelligence · Computer Science 2025-08-28 Yao Fu , Xianxuan Long , Runchao Li , Haotian Yu , Mu Sheng , Xiaotian Han , Yu Yin , Pan Li

Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and…

Machine Learning · Computer Science 2024-06-21 Yijun Liu , Yuan Meng , Fang Wu , Shenhao Peng , Hang Yao , Chaoyu Guan , Chen Tang , Xinzhu Ma , Zhi Wang , Wenwu Zhu

Quantization has emerged as a mainstream method for compressing Large Language Models (LLMs), reducing memory requirements and accelerating inference without architectural modifications. While existing research primarily focuses on…

Software Engineering · Computer Science 2025-07-01 Sen Fang , Weiyuan Ding , Antonio Mastropaolo , Bowen Xu

Large language models (LLMs) have been proven capable of memorizing their training data, which can be extracted through specifically designed prompts. As the scale of datasets continues to grow, privacy risks arising from memorization have…

Computation and Language · Computer Science 2023-11-07 Zhenhong Zhou , Jiuyang Xiang , Chaomeng Chen , Sen Su

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements…

Machine Learning · Computer Science 2024-10-10 Ruihao Gong , Yang Yong , Shiqiao Gu , Yushi Huang , Chengtao Lv , Yunchen Zhang , Xianglong Liu , Dacheng Tao

Large language models have achieved significant advancements in complex mathematical reasoning benchmarks, such as MATH. However, their substantial computational requirements present challenges for practical deployment. Model quantization…

Computation and Language · Computer Science 2025-02-25 Zhen Li , Yupeng Su , Runming Yang , Congkai Xie , Zheng Wang , Zhongwei Xie , Ngai Wong , Hongxia Yang

This paper presents a comprehensive analysis of quantization techniques for optimizing Large Language Models (LLMs), specifically focusing on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Through empirical…

Machine Learning · Computer Science 2024-11-12 Jahid Hasan

Large language models (LLMs) achieve strong performance but incur high deployment costs, motivating extremely low-bit but lossy quantization. Existing quantization algorithms mainly focus on improving the numerical accuracy of forward…

Computation and Language · Computer Science 2026-05-18 Yuzhuang Xu , Xu Han , Yuxuan Li , Pengzhan Li , Wanxiang Che

Recent studies introduced effective compression techniques for Large Language Models (LLMs) via post-training quantization or low-bit weight representation. Although quantized weights offer storage efficiency and allow for faster inference,…

Computation and Language · Computer Science 2024-05-02 Irina Proskurina , Luc Brun , Guillaume Metzler , Julien Velcin

Large language models for code (LLMs4Code) rely heavily on massive training data, including sensitive data, such as cloud service credentials of the projects and personal identifiable information of the developers, raising serious privacy…

Software Engineering · Computer Science 2025-08-04 Md Nazmul Haque , Hua Yang , Zhou Yang , Bowen Xu

Quantization is a popular technique that $transforms$ the parameter representation of a neural network from floating-point numbers into lower-precision ones ($e.g.$, 8-bit integers). It reduces the memory footprint and the computational…

Machine Learning · Computer Science 2021-11-12 Sanghyun Hong , Michael-Andrei Panaitescu-Liess , Yiğitcan Kaya , Tudor Dumitraş

Recent advancements in large language models (LLMs) have shown their remarkable capacities in many NLP tasks. However, their substantial size often presents challenges for deployment. This necessitates efficient techniques for model…

Computation and Language · Computer Science 2026-05-20 Robin Baki Davidsson , Pierre Nugues

Large Language Models for code (LLMs4Code) are increasingly used to generate software artifacts, including library and package recommendations in languages such as Go. However, recent evidence shows that LLMs frequently hallucinate package…

Software Engineering · Computer Science 2025-12-10 Md Nazmul Haque , Elizabeth Lin , Lawrence Arkoh , Biruk Tadesse , Bowen Xu
‹ Prev 1 2 3 10 Next ›