Related papers: How Does Quantization Affect Multilingual LLMs?

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Does quantization affect models' performance on long-context tasks?

Large language models (LLMs) now support context windows exceeding 128K tokens, but this comes with significant memory requirements and high inference latency. Quantization can mitigate these costs, but may degrade performance. In this…

Computation and Language · Computer Science 2025-09-23 Anmol Mekala , Anirudh Atmakuru , Yixiao Song , Marzena Karpinska , Mohit Iyyer

What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be…

Machine Learning · Computer Science 2024-03-12 Zhuocheng Gong , Jiahao Liu , Jingang Wang , Xunliang Cai , Dongyan Zhao , Rui Yan

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs…

Computation and Language · Computer Science 2023-07-27 Peiyu Liu , Zikang Liu , Ze-Feng Gao , Dawei Gao , Wayne Xin Zhao , Yaliang Li , Bolin Ding , Ji-Rong Wen

Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLM

Quantization is an effective technique for reducing the storage footprint and computational costs of Large Language Models (LLMs), but it often results in performance degradation. Existing post-training quantization methods typically use…

Computation and Language · Computer Science 2026-01-27 Everlyn Asiko Chimoto , Mostafa Elhoushi , Bruce A. Bassett

Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and…

Machine Learning · Computer Science 2024-06-21 Yijun Liu , Yuan Meng , Fang Wu , Shenhao Peng , Hang Yao , Chaoyu Guan , Chen Tang , Xinzhu Ma , Zhi Wang , Wenwu Zhu

The Uneven Impact of Post-Training Quantization in Machine Translation

Quantization is essential for deploying large language models (LLMs) on resource-constrained hardware, but its implications for multilingual tasks remain underexplored. We conduct the first large-scale evaluation of post-training…

Computation and Language · Computer Science 2025-08-29 Benjamin Marie , Atsushi Fujita

Quantification and object perception in Multimodal Large Language Models and human linguistic cognition

Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the logic, pragmatic, and numerical domains, the exact…

Computation and Language · Computer Science 2026-03-26 Raquel Montero , Natalia Moskvina , Paolo Morosi , Tamara Serrano , Elena Pagliarini , Evelina Leivada

K-Quantization and its Impact on Output Performance

Recent advancements in large language models (LLMs) have shown their remarkable capacities in many NLP tasks. However, their substantial size often presents challenges for deployment. This necessitates efficient techniques for model…

Computation and Language · Computer Science 2026-05-20 Robin Baki Davidsson , Pierre Nugues

When Quantization Affects Confidence of Large Language Models?

Recent studies introduced effective compression techniques for Large Language Models (LLMs) via post-training quantization or low-bit weight representation. Although quantized weights offer storage efficiency and allow for faster inference,…

Computation and Language · Computer Science 2024-05-02 Irina Proskurina , Luc Brun , Guillaume Metzler , Julien Velcin

Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations

Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require…

Computation and Language · Computer Science 2026-01-05 Qianli Wang , Nils Feldhus , Pepa Atanasova , Fedor Splitt , Simon Ostermann , Sebastian Möller , Vera Schmitt

Tokenizer Choice For LLM Training: Negligible or Crucial?

The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer…

Machine Learning · Computer Science 2024-03-19 Mehdi Ali , Michael Fromm , Klaudia Thellmann , Richard Rutmann , Max Lübbering , Johannes Leveling , Katrin Klug , Jan Ebert , Niclas Doll , Jasper Schulze Buschhoff , Charvi Jain , Alexander Arno Weber , Lena Jurkschat , Hammam Abdelwahab , Chelsea John , Pedro Ortiz Suarez , Malte Ostendorff , Samuel Weinbach , Rafet Sifa , Stefan Kesselheim , Nicolas Flores-Herr

How Important is `Perfect' English for Machine Translation Prompts?

Large language models (LLMs) have achieved top results in recent machine translation evaluations, but they are also known to be sensitive to errors and perturbations in their prompts. We systematically evaluate how both humanly plausible…

Computation and Language · Computer Science 2025-09-03 Patrícia Schmidtová , Niyati Bafna , Seth Aycock , Gianluca Vico , Wiktor Kamzela , Katharina Hämmerl , Vilém Zouhar

A Comprehensive Study on Quantization Techniques for Large Language Models

Large Language Models (LLMs) have been extensively researched and used in both academia and industry since the rise in popularity of the Transformer model, which demonstrates excellent performance in AI. However, the computational demands…

Machine Learning · Computer Science 2024-11-06 Jiedong Lang , Zhehao Guo , Shuyu Huang

Interpreting the Effects of Quantization on LLMs

Quantization offers a practical solution to deploy LLMs in resource-constraint environments. However, its impact on internal representations remains understudied, raising questions about the reliability of quantized models. In this study,…

Machine Learning · Computer Science 2025-11-21 Manpreet Singh , Hassan Sajjad

English K_Quantization of LLMs Does Not Disproportionately Diminish Multilingual Performance

For consumer usage of locally deployed LLMs, the GGUF format and k\_quantization are invaluable tools for maintaining the performance of the original model while reducing it to sizes deployable with consumer-grade hardware. The number of…

Computation and Language · Computer Science 2026-01-23 Karl Audun Borgersen , Morten Goodwin

Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation

Machine-translated benchmarks are widely used to assess the multilingual capabilities of large language models (LLMs), yet translation errors in these benchmarks remain underexplored, raising concerns about the reliability and comparability…

Computation and Language · Computer Science 2026-05-26 Klaudia-Doris Thellmann , Bernhard Stadler , Michael Färber , Jens Lehmann

Is Quantization a Deal-breaker? Empirical Insights from Large Code Models

The growing scale of large language models (LLMs) not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that…

Software Engineering · Computer Science 2025-07-15 Saima Afrin , Bowen Xu , Antonio Mastropaolo

The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead?

Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks, leading researchers to use them for time and labor-intensive analyses. However, their capability to handle highly specialized and…

Computation and Language · Computer Science 2024-10-08 Alexander S. Choi , Syeda Sabrina Akter , JP Singh , Antonios Anastasopoulos

Evaluating Quantized Large Language Models

Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the…

Computation and Language · Computer Science 2024-06-07 Shiyao Li , Xuefei Ning , Luning Wang , Tengxuan Liu , Xiangsheng Shi , Shengen Yan , Guohao Dai , Huazhong Yang , Yu Wang