English
Related papers

Related papers: Leech Lattice Vector Quantization for Efficient LL…

200 papers

Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a unified formulation of different non-parametric quantization methods through the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Yue Zhao , Hanwen Jiang , Zhenlin Xu , Chutong Yang , Ehsan Adeli , Philipp Krähenbühl

Recent works on compression of large language models (LLM) using quantization considered reparameterizing the architecture such that weights are distributed on the sphere. This demonstratively improves the ability to quantize by increasing…

Machine Learning · Computer Science 2024-12-05 Tycho F. A. van der Ouderaa , Maximilian L. Croci , Agrin Hilmkil , James Hensman

Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing…

Machine Learning · Computer Science 2026-01-27 Xi Zhang , Xiaolin Wu , Jiamang Wang , Weisi Lin

Large language models~(LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for…

Machine Learning · Computer Science 2025-02-11 Wen-Pu Cai , Ming-Yang Li , Wu-Jun Li

It is customary to deploy uniform scalar quantization in the end-to-end optimized Neural image compression methods, instead of more powerful vector quantization, due to the high complexity of the latter. Lattice vector quantization (LVQ),…

Image and Video Processing · Electrical Eng. & Systems 2024-11-26 Xi Zhang , Xiaolin Wu

In recent years, compression of large language models (LLMs) has emerged as an important problem to enable language model deployment on resource-constrained devices, reduce computational costs, and mitigate the environmental footprint of…

Machine Learning · Computer Science 2024-10-04 Sean I. Young

In this paper we introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations. Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based…

Machine Learning · Computer Science 2023-10-17 Ahmed Khalil , Robert Piechocki , Raul Santos-Rodriguez

The growing context length of Large Language Models (LLMs) enlarges the Key-Value (KV) cache, limiting deployment in resource-limited environments. Prior training-free approaches for KV cache compression typically rely on low-rank…

Computation and Language · Computer Science 2026-03-18 Yixuan Wang , Qingyu Shi , Jiayu Zhou , Dianbo Liu , Ziwei He , Zhouhan Lin

KV cache compression methods have mainly relied on scalar quantization techniques to reduce the memory requirements during decoding. In this work, we apply residual vector quantization, which has been widely used for high fidelity audio…

Machine Learning · Computer Science 2024-10-22 Ankur Kumar

Vision-Language Models (VLMs) achieve outstanding performance, yet their huge model size severely hinders deployment on edge devices with limited resources. As an efficient model compression technique, vector quantization (VQ) excels in…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Zhong Wang , Zukang Xu , Xing Hu , Dawei Yang

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements…

Machine Learning · Computer Science 2024-10-10 Ruihao Gong , Yang Yong , Shiqiao Gu , Yushi Huang , Chengtao Lv , Yunchen Zhang , Xianglong Liu , Dacheng Tao

Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied,…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Zezhou Zhu , Yucong Zhou , Zhao Zhong

The rapid advancement of large language models (LLMs) has intensified the need for effective mechanisms to transform continuous multimodal data into discrete representations suitable for language-based processing. Discrete tokenization,…

Computation and Language · Computer Science 2025-08-01 Jindong Li , Yali Fu , Jiahong Liu , Linxiao Cao , Wei Ji , Menglin Yang , Irwin King , Ming-Hsuan Yang

As Large Language Models (LLMs) demonstrate exceptional performance across various domains, deploying LLMs on edge devices has emerged as a new trend. Quantization techniques, which reduce the size and memory requirements of LLMs, are…

Computation and Language · Computer Science 2025-05-07 Binrui Zeng , Bin Ji , Xiaodong Liu , Jie Yu , Shasha Li , Jun Ma , Xiaopeng Li , Shangwen Wang , Xinran Hong , Yongtao Tang

Large Language Models (LLMs) have been extensively researched and used in both academia and industry since the rise in popularity of the Transformer model, which demonstrates excellent performance in AI. However, the computational demands…

Machine Learning · Computer Science 2024-11-06 Jiedong Lang , Zhehao Guo , Shuyu Huang

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks. However, their extensive memory requirements, particularly due to KV cache growth during long-text understanding and…

Computation and Language · Computer Science 2025-10-14 Haoqi Yang , Yao Yao , Zuchao Li , Baoyuan Qi , Guoming Liu , Hai Zhao

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality. We propose the GPTVQ method, a new fast method for post-training vector…

Recent advancements in Large Language Models (LLMs) have spurred interest in numerous applications requiring robust long-range capabilities, essential for processing extensive input contexts and continuously generating extended outputs. As…

Machine Learning · Computer Science 2025-07-22 Dachuan Shi , Yonggan Fu , Xiangchi Yuan , Zhongzhi Yu , Haoran You , Sixu Li , Xin Dong , Jan Kautz , Pavlo Molchanov , Yingyan , Lin

Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on…

Machine Learning · Computer Science 2024-02-21 Yuxuan Yue , Zhihang Yuan , Haojie Duanmu , Sifan Zhou , Jianlong Wu , Liqiang Nie

Large Language Models (LLMs) have achieved remarkable success but face significant computational and memory challenges, particularly due to their extensive output vocabularies. The final linear projection layer, mapping hidden states to…

Computation and Language · Computer Science 2025-05-16 Jintian Shao , Hongyi Huang , Jiayi Wu , YiMing Cheng , ZhiYu Wu , You Shan , MingKai Zheng
‹ Prev 1 2 3 10 Next ›