English
Related papers

Related papers: Achieving binary weight and activation for LLMs us…

200 papers

In the era of large-scale language models, the substantial parameter size poses significant challenges for deployment. Being a prevalent compression technique, quantization has emerged as the mainstream practice to tackle this issue, which…

Computation and Language · Computer Science 2023-08-31 Qingyuan Li , Yifan Zhang , Liang Li , Peng Yao , Bo Zhang , Xiangxiang Chu , Yerui Sun , Li Du , Yuchen Xie

Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical deployment. Binarization can compress weights to 1 bit, fundamentally lowering compute and bandwidth…

Machine Learning · Computer Science 2026-05-04 Zhixiong Zhao , Zukang Xu , Dawei Yang

Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands. This paper focuses on post-training quantization (PTQ) in…

Computation and Language · Computer Science 2024-07-19 Janghwan Lee , Minsoo Kim , Seungcheol Baek , Seok Joong Hwang , Wonyong Sung , Jungwook Choi

Deploying large language models (LLMs) in resource-constrained environments is hindered by heavy computational and memory requirements. We present LBLLM, a lightweight binarization framework that achieves effective W(1+1)A4 quantization…

Machine Learning · Computer Science 2026-04-22 Siqing Song , Chuang Wang , Yong Lang , Yi Yang , Xu-Yao Zhang

With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization…

Machine Learning · Computer Science 2025-02-11 Jung Hyun Lee , Jeonghoon Kim , June Yong Yang , Se Jung Kwon , Eunho Yang , Kang Min Yoo , Dongsoo Lee

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

We propose LLM-FP4 for quantizing both weights and activations in large language models (LLMs) down to 4-bit floating-point values, in a post-training manner. Existing post-training quantization (PTQ) solutions are primarily integer-based…

Computation and Language · Computer Science 2024-04-30 Shih-yang Liu , Zechun Liu , Xijie Huang , Pingcheng Dong , Kwang-Ting Cheng

Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit…

Computation and Language · Computer Science 2025-02-27 Liangdong Liu , Zhitong Zheng , Cong Wang , Tianhuang Su , Zhenyu Yang

Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of…

Machine Learning · Computer Science 2024-10-17 Sayeh Sharify , Utkarsh Saxena , Zifei Xu , Wanzin Yazar , Ilya Soloveychik , Xin Wang

Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely…

Machine Learning · Computer Science 2024-06-19 Wei Huang , Yangdong Liu , Haotong Qin , Ying Li , Shiming Zhang , Xianglong Liu , Michele Magno , Xiaojuan Qi

Post-training quantization (PTQ) is a promising approach to reducing the storage and computational requirements of large language models (LLMs) without additional training cost. Recent PTQ studies have primarily focused on quantizing only…

Machine Learning · Computer Science 2026-02-17 Reena Elangovan , Charbel Sakr , Anand Raghunathan , Brucek Khailany

Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs). However, a systematic examination of various quantization schemes, model…

Machine Learning · Computer Science 2023-05-29 Zhewei Yao , Xiaoxia Wu , Cheng Li , Stephen Youn , Yuxiong He

The size of a model has been a strong predictor of its quality, as well as its cost. As such, the trade-off between model cost and quality has been well-studied. Post-training optimizations like quantization and pruning have typically…

Machine Learning · Computer Science 2025-08-29 Giuseppe Franco , Pablo Monteagudo-Lago , Ian Colbert , Nicholas Fraser , Michaela Blott

Large language models (LLMs) require immense resources for training and inference. Quantization, a technique that reduces the precision of model parameters, offers a promising solution for improving LLM efficiency and sustainability. While…

Machine Learning · Computer Science 2025-02-18 Jacob Nielsen , Peter Schneider-Kamp , Lukas Galke

1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training…

Computation and Language · Computer Science 2026-05-19 Zhijun Tu , Jian Li , Yuanyuan Xi , Siqi Liu , Chuanjian Liu , Hanting Chen , Jie Hu , Yunhe Wang

We consider the problem of model compression for Large Language Models (LLMs) at post-training time, where the task is to compress a well-trained model using only a small set of calibration input data. In this work, we introduce a new…

Machine Learning · Statistics 2024-12-12 Meyer Scetbon , James Hensman

Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an…

Machine Learning · Computer Science 2025-08-07 Jiaqi Zhao , Miao Zhang , Ming Wang , Yuzhang Shang , Kaihao Zhang , Weili Guan , Yaowei Wang , Min Zhang

Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come with significant computational and storage costs. Quantizing these models is an effective way to alleviate this issue. However, existing…

Machine Learning · Computer Science 2023-11-14 Baisong Li , Xingwang Wang , Haixiao Xu

Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) levels, as they either require large amounts of…

Machine Learning · Computer Science 2026-05-19 Hyochan Chong , Dongkyu Kim , Changdong Kim , Minseop Choi

Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed…

Machine Learning · Computer Science 2024-07-04 Yipin Guo , Yilin Lang , Qinyuan Ren
‹ Prev 1 2 3 10 Next ›