English
Related papers

Related papers: How Does Quantization Affect Multilingual LLMs?

200 papers

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Large language models (LLMs) now support context windows exceeding 128K tokens, but this comes with significant memory requirements and high inference latency. Quantization can mitigate these costs, but may degrade performance. In this…

Computation and Language · Computer Science 2025-09-23 Anmol Mekala , Anirudh Atmakuru , Yixiao Song , Marzena Karpinska , Mohit Iyyer

Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be…

Machine Learning · Computer Science 2024-03-12 Zhuocheng Gong , Jiahao Liu , Jingang Wang , Xunliang Cai , Dongyan Zhao , Rui Yan

Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs…

Computation and Language · Computer Science 2023-07-27 Peiyu Liu , Zikang Liu , Ze-Feng Gao , Dawei Gao , Wayne Xin Zhao , Yaliang Li , Bolin Ding , Ji-Rong Wen

Quantization is an effective technique for reducing the storage footprint and computational costs of Large Language Models (LLMs), but it often results in performance degradation. Existing post-training quantization methods typically use…

Computation and Language · Computer Science 2026-01-27 Everlyn Asiko Chimoto , Mostafa Elhoushi , Bruce A. Bassett

Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and…

Machine Learning · Computer Science 2024-06-21 Yijun Liu , Yuan Meng , Fang Wu , Shenhao Peng , Hang Yao , Chaoyu Guan , Chen Tang , Xinzhu Ma , Zhi Wang , Wenwu Zhu

Quantization is essential for deploying large language models (LLMs) on resource-constrained hardware, but its implications for multilingual tasks remain underexplored. We conduct the first large-scale evaluation of post-training…

Computation and Language · Computer Science 2025-08-29 Benjamin Marie , Atsushi Fujita

Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the logic, pragmatic, and numerical domains, the exact…

Computation and Language · Computer Science 2026-03-26 Raquel Montero , Natalia Moskvina , Paolo Morosi , Tamara Serrano , Elena Pagliarini , Evelina Leivada

Recent advancements in large language models (LLMs) have shown their remarkable capacities in many NLP tasks. However, their substantial size often presents challenges for deployment. This necessitates efficient techniques for model…

Computation and Language · Computer Science 2026-05-20 Robin Baki Davidsson , Pierre Nugues

Recent studies introduced effective compression techniques for Large Language Models (LLMs) via post-training quantization or low-bit weight representation. Although quantized weights offer storage efficiency and allow for faster inference,…

Computation and Language · Computer Science 2024-05-02 Irina Proskurina , Luc Brun , Guillaume Metzler , Julien Velcin

Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require…

Computation and Language · Computer Science 2026-01-05 Qianli Wang , Nils Feldhus , Pepa Atanasova , Fedor Splitt , Simon Ostermann , Sebastian Möller , Vera Schmitt

The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer…

Large language models (LLMs) have achieved top results in recent machine translation evaluations, but they are also known to be sensitive to errors and perturbations in their prompts. We systematically evaluate how both humanly plausible…

Computation and Language · Computer Science 2025-09-03 Patrícia Schmidtová , Niyati Bafna , Seth Aycock , Gianluca Vico , Wiktor Kamzela , Katharina Hämmerl , Vilém Zouhar

Large Language Models (LLMs) have been extensively researched and used in both academia and industry since the rise in popularity of the Transformer model, which demonstrates excellent performance in AI. However, the computational demands…

Machine Learning · Computer Science 2024-11-06 Jiedong Lang , Zhehao Guo , Shuyu Huang

Quantization offers a practical solution to deploy LLMs in resource-constraint environments. However, its impact on internal representations remains understudied, raising questions about the reliability of quantized models. In this study,…

Machine Learning · Computer Science 2025-11-21 Manpreet Singh , Hassan Sajjad

For consumer usage of locally deployed LLMs, the GGUF format and k\_quantization are invaluable tools for maintaining the performance of the original model while reducing it to sizes deployable with consumer-grade hardware. The number of…

Computation and Language · Computer Science 2026-01-23 Karl Audun Borgersen , Morten Goodwin

Machine-translated benchmarks are widely used to assess the multilingual capabilities of large language models (LLMs), yet translation errors in these benchmarks remain underexplored, raising concerns about the reliability and comparability…

Computation and Language · Computer Science 2026-05-26 Klaudia-Doris Thellmann , Bernhard Stadler , Michael Färber , Jens Lehmann

The growing scale of large language models (LLMs) not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that…

Software Engineering · Computer Science 2025-07-15 Saima Afrin , Bowen Xu , Antonio Mastropaolo

Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks, leading researchers to use them for time and labor-intensive analyses. However, their capability to handle highly specialized and…

Computation and Language · Computer Science 2024-10-08 Alexander S. Choi , Syeda Sabrina Akter , JP Singh , Antonios Anastasopoulos

Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the…

Computation and Language · Computer Science 2024-06-07 Shiyao Li , Xuefei Ning , Luning Wang , Tengxuan Liu , Xiangsheng Shi , Shengen Yan , Guohao Dai , Huazhong Yang , Yu Wang
‹ Prev 1 2 3 10 Next ›