English
Related papers

Related papers: MobileQuant: Mobile-friendly Quantization for On-d…

200 papers

Large Language Models (LLMs) stand out for their impressive performance in intricate language modeling tasks. However, their demanding computational and memory needs pose obstacles for broad use on edge devices. Quantization is then…

Machine Learning · Computer Science 2025-04-22 Xuan Shen , Peiyan Dong , Lei Lu , Zhenglun Kong , Zhengang Li , Ming Lin , Chao Wu , Yanzhi Wang

Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource…

Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) levels, as they either require large amounts of…

Machine Learning · Computer Science 2026-05-19 Hyochan Chong , Dongkyu Kim , Changdong Kim , Minseop Choi

Despite the growing interest in Small Language Models (SLMs) as resource-efficient alternatives to Large Language Models (LLMs), their deployment on edge devices remains challenging due to unresolved efficiency gaps in model compression.…

Machine Learning · Computer Science 2025-11-18 Jiacheng Wang , Yejun Zeng , Jinyang Guo , Yuqing Ma , Aishan Liu , Xianglong Liu

Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ)…

Machine Learning · Computer Science 2024-03-19 Wenqi Shao , Mengzhao Chen , Zhaoyang Zhang , Peng Xu , Lirui Zhao , Zhiqian Li , Kaipeng Zhang , Peng Gao , Yu Qiao , Ping Luo

Deploying Large Language Models (LLMs) on edge or mobile devices offers significant benefits, such as enhanced data privacy and real-time processing capabilities. However, it also faces critical challenges due to the substantial memory…

Machine Learning · Computer Science 2024-05-07 Yu Mao , Weilan Wang , Hongchao Du , Nan Guan , Chun Jason Xue

A growing trend has emerged in designing high-quality Small Language Models (SLMs) with a few million parameters. This trend is driven by the increasing concerns over cloud costs, privacy, and latency. Considering that full parameter…

Machine Learning · Computer Science 2025-07-03 Xuan Shen , Peiyan Dong , Zhenglun Kong , Yifan Gong , Changdi Yang , Zhaoyang Han , Yanyue Xie , Lei Lu , Cheng Lyu , Chao Wu , Yanzhi Wang , Pu Zhao

Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency at the same…

Computation and Language · Computer Science 2024-04-03 Guangxuan Xiao , Ji Lin , Mickael Seznec , Hao Wu , Julien Demouth , Song Han

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost and protect users' privacy. However, the…

Computation and Language · Computer Science 2026-04-28 Ji Lin , Jiaming Tang , Haotian Tang , Shang Yang , Wei-Ming Chen , Wei-Chen Wang , Guangxuan Xiao , Xingyu Dang , Chuang Gan , Song Han

Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on…

Machine Learning · Computer Science 2024-02-21 Yuxuan Yue , Zhihang Yuan , Haojie Duanmu , Sifan Zhou , Jianlong Wu , Liqiang Nie

Deploying Large Language Models (LLMs) on edge devices is increasingly important, as it eliminates reliance on network connections, reduces expensive API calls, and enhances user privacy. However, on-device deployment is challenging due to…

Machine Learning · Computer Science 2025-05-26 Yijiang Liu , Hengyu Fang , Liulu He , Rongyu Zhang , Yichuan Bai , Yuan Du , Li Du

The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints…

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Deploying Large Language Models (LLMs) on resource-constrained edge devices like the Raspberry Pi presents challenges in computational efficiency, power consumption, and response latency. This paper explores quantization-based optimization…

Machine Learning · Computer Science 2025-04-04 Mahsa Ardakani , Jinendra Malekar , Ramtin Zand

Quantization is an effective technique to reduce the deployment cost of large language models (LLMs), and post-training quantization (PTQ) has been widely studied due to its efficiency. However, existing PTQ methods are limited by their…

Machine Learning · Computer Science 2025-09-30 Qitao Tan , Xiaoying Song , Jin Lu , Guoming Li , Jun Liu , Lingzi Hong , Caiwen Ding , Jundong Li , Xiaoming Zhai , Shaoyi Huang , Wei Niu , Geng Yuan

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

Large Language Models (LLMs) with multimodal capabilities have revolutionized vision-language tasks, but their deployment often requires huge memory and computational resources. While post-training quantization (PTQ) has successfully…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Shubhang Bhatnagar , Andy Xu , Kar-Han Tan , Narendra Ahuja

Large language models (LLMs) have achieved remarkable advancements in natural language processing, showcasing exceptional performance across various tasks. However, the expensive memory and computational requirements present significant…

Artificial Intelligence · Computer Science 2025-11-13 Ruihao Gong , Yifu Ding , Zining Wang , Chengtao Lv , Xingyu Zheng , Jinyang Du , Haotong Qin , Jinyang Guo , Michele Magno , Xianglong Liu

The advent of large language models (LLMs) revolutionized natural language processing applications, and running LLMs on edge devices has become increasingly attractive for reasons including reduced latency, data localization, and…

Computation and Language · Computer Science 2024-09-17 Jiajun Xu , Zhiyuan Li , Wei Chen , Qun Wang , Xin Gao , Qi Cai , Ziyuan Ling

Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed…

Machine Learning · Computer Science 2024-07-04 Yipin Guo , Yilin Lang , Qinyuan Ren
‹ Prev 1 2 3 10 Next ›