Related papers: Edge Intelligence Optimization for Large Language …

Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

Large language models (LLMs) have advanced rapidly, emerging as versatile tools across fields thanks to their exceptional language understanding, generation, and reasoning capabilities. However, performing LLM inference at the network edge…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-28 Zhixiong Chen , Bingjie Zhu , Jiangzhou Wang , Hyundong Shin , Arumugam Nallanathan , Dusit Niyato

A Review on Edge Large Language Models: Design, Execution, and Applications

Large language models (LLMs) have revolutionized natural language processing with their exceptional understanding, synthesizing, and reasoning capabilities. However, deploying LLMs on resource-constrained edge devices presents significant…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-25 Yue Zheng , Yuhao Chen , Bin Qian , Xiufang Shi , Yuanchao Shu , Jiming Chen

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Large language models (LLMs) have shown great potential in natural language processing and content generation. However, current LLMs heavily rely on cloud computing, leading to prolonged latency, high bandwidth cost, and privacy concerns.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-24 Mingjin Zhang , Jiannong Cao , Xiaoming Shen , Zeyang Cui

Large Language Models Empowered Autonomous Edge AI for Connected Intelligence

The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial…

Information Theory · Computer Science 2023-12-27 Yifei Shen , Jiawei Shao , Xinjie Zhang , Zehong Lin , Hao Pan , Dongsheng Li , Jun Zhang , Khaled B. Letaief

Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges

Edge Intelligence (EI) has been instrumental in delivering real-time, localized services by leveraging the computational capabilities of edge networks. The integration of Large Language Models (LLMs) empowers EI to evolve into the next…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-07 Handi Chen , Weipeng Deng , Shuo Yang , Jinfeng Xu , Zhihan Jiang , Edith C. H. Ngai , Jiangchuan Liu , Xue Liu

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest since they are more cost-effective, latency-efficient, and privacy-preserving compared with the cloud paradigm.…

Networking and Internet Architecture · Computer Science 2025-03-21 Guanqiao Qu , Qiyuan Chen , Wei Wei , Zheng Lin , Xianhao Chen , Kaibin Huang

Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding

Large language model (LLM) inference at the network edge is a promising serving paradigm that leverages distributed edge resources to run inference near users and enhance privacy. Existing edge-based LLM inference systems typically adopt…

Systems and Control · Electrical Eng. & Systems 2025-10-14 Bingjie Zhu , Zhixiong Chen , Liqiang Zhao , Hyundong Shin , Arumugam Nallanathan

Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview…

Machine Learning · Computer Science 2025-09-03 Sanjay Surendranath Girija , Shashank Kapoor , Lakshit Arora , Dipen Pradhan , Aman Raj , Ankit Shetgaonkar

Camel: Energy-Aware LLM Inference on Resource-Constrained Devices

Most Large Language Models (LLMs) are currently deployed in the cloud, with users relying on internet connectivity for access. However, this paradigm faces challenges such as network latency, privacy concerns, and bandwidth limits. Thus,…

Networking and Internet Architecture · Computer Science 2025-08-14 Hao Xu , Long Peng , Shezheng Song , Xiaodong Liu , Ma Jun , Shasha Li , Jie Yu , Xiaoguang Mao

Energy Considerations of Large Language Model Inference and Efficiency Optimizations

As large language models (LLMs) scale in size and adoption, their computational and environmental costs continue to rise. Prior benchmarking efforts have primarily focused on latency reduction in idealized settings, often overlooking the…

Computation and Language · Computer Science 2025-04-25 Jared Fernandez , Clara Na , Vashisth Tiwari , Yonatan Bisk , Sasha Luccioni , Emma Strubell

Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach

With the proliferation of edge devices, there is a significant increase in attack surface on these devices. The decentralized deployment of threat intelligence on edge devices, coupled with adaptive machine learning techniques such as the…

Cryptography and Security · Computer Science 2024-10-10 Syed Mhamudul Hasan , Alaa M. Alotaibi , Sajedul Talukder , Abdur R. Shahid

Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge

The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making. While state-of-the-art LMs often boast hundreds of billions of…

Machine Learning · Computer Science 2025-11-24 Maximilian Abstreiter , Sasu Tarkoma , Roberto Morabito

Toward Edge General Intelligence with Multiple-Large Language Model (Multi-LLM): Architecture, Trust, and Orchestration

Edge computing enables real-time data processing closer to its source, thus improving the latency and performance of edge-enabled AI applications. However, traditional AI models often fall short when dealing with complex, dynamic tasks that…

Networking and Internet Architecture · Computer Science 2025-07-02 Haoxiang Luo , Yinqiu Liu , Ruichen Zhang , Jiacheng Wang , Gang Sun , Dusit Niyato , Hongfang Yu , Zehui Xiong , Xianbin Wang , Xuemin Shen

Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment

This article surveys Cognitive Edge Computing as a practical and methodical pathway for deploying reasoning-capable Large Language Models (LLMs) and autonomous AI agents on resource-constrained devices at the network edge. We present a…

Machine Learning · Computer Science 2025-11-10 Xubin Wang , Qing Li , Weijia Jia

Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency

Deploying Large Language Models (LLMs) on edge devices presents significant challenges due to computational constraints, memory limitations, inference speed, and energy consumption. Model quantization has emerged as a key technique to…

Computers and Society · Computer Science 2025-04-07 Erik Johannes Husom , Arda Goknil , Merve Astekin , Lwin Khin Shar , Andre Kåsen , Sagar Sen , Benedikt Andreas Mithassel , Ahmet Soylu

Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use

Large Language Models (LLMs) are increasingly deployed in production, contributing towards shifting the burden in terms of computational resources and energy demands from training to inference. While prior work has examined the energy cost…

Machine Learning · Computer Science 2026-02-02 Julien Delavande , Regis Pierrard , Sasha Luccioni

EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs

Edge intelligence paradigm is increasingly demanded by the emerging autonomous systems, such as robotics. Beyond ensuring privacy-preserving operation and resilience in connectivity-limited environments, edge deployment offers significant…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-05 Benjamin Kubwimana , Qijing Huang

On Optimal Caching and Model Multiplexing for Large Model Inference

Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is…

Machine Learning · Computer Science 2023-08-30 Banghua Zhu , Ying Sheng , Lianmin Zheng , Clark Barrett , Michael I. Jordan , Jiantao Jiao

MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices

Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs.…

Machine Learning · Computer Science 2025-06-13 Zhaode Wang , Jingbang Yang , Xinyu Qian , Shiwen Xing , Xiaotang Jiang , Chengfei Lv , Shengyu Zhang

Evolutionary thoughts: integration of large language models and evolutionary algorithms

Large Language Models (LLMs) have unveiled remarkable capabilities in understanding and generating both natural language and code, but LLM reasoning is prone to hallucination and struggle with complex, novel scenarios, often getting stuck…

Neural and Evolutionary Computing · Computer Science 2025-05-12 Antonio Jimeno Yepes , Pieter Barnard