English
Related papers

Related papers: VELO: A Vector Database-Assisted Cloud-Edge Collab…

200 papers

Large language model (LLM) serving is becoming an increasingly critical workload for cloud providers. Existing LLM serving systems focus on interactive requests, such as chatbots and coding assistants, with tight latency SLO requirements.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-26 Archit Patke , Dhemath Reddy , Saurabh Jha , Haoran Qiu , Christian Pinto , Chandra Narayanaswami , Zbigniew Kalbarczyk , Ravishankar Iyer

The combination of Federated Learning (FL), Multimodal Large Language Models (MLLMs), and edge-cloud computing enables distributed and real-time data processing while preserving privacy across edge devices and cloud infrastructure. However,…

Neural and Evolutionary Computing · Computer Science 2025-02-19 Gaith Rjouba , Hanae Elmekki , Saidul Islam , Jamal Bentahar , Rachida Dssouli

The remarkable performance of Large Language Models (LLMs) has inspired many applications, which often necessitate edge-cloud collaboration due to connectivity, privacy, and cost considerations. Traditional methods primarily focus on…

Databases · Computer Science 2025-07-15 Prasoon Patidar , Alex Crown , Kevin Hsieh , Yifei Xu , Tusher Chakraborty , Ranveer Chandra , Yuvraj Agarwal

Large language models (LLMs) have demonstrated impressive capabilities in language tasks, but they require high computing power and rely on static knowledge. To overcome these limitations, Retrieval-Augmented Generation (RAG) incorporates…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-17 Jiaxing Li , Chi Xu , Lianchen Jia , Feng Wang , Cong Zhang , Jiangchuan Liu

Running Large Language Models (LLMs) on edge devices is crucial for reducing latency, improving real-time processing, and enhancing privacy. By performing inference directly on the device, data does not need to be sent to the cloud,…

Hardware Architecture · Computer Science 2025-10-21 Tianhua Xia , Sai Qian Zhang

Large Language Models (LLMs) excel in natural language processing tasks but pose significant computational and memory challenges for edge deployment due to their intensive resource demands. This work addresses the efficiency of LLM…

Hardware Architecture · Computer Science 2025-07-02 Zhican Wang , Hongxiang Fan , Haroon Waris , Gang Wang , Zhenyu Li , Jianfei Jiang , Yanan Sun , Guanghui He

Emerging intelligent service scenarios in 6G communication impose stringent requirements for low latency, high reliability, and privacy preservation. Generative large language models (LLMs) are gradually becoming key enablers for the…

Networking and Internet Architecture · Computer Science 2025-05-21 Pengyan Zhu , Tingting Yang

Large Language Models (LLMs) have demonstrated remarkable capabilities, leading to a significant increase in user demand for LLM services. However, cloud-based LLM services often suffer from high latency, unstable responsiveness, and…

Networking and Internet Architecture · Computer Science 2025-08-04 Jin Yang , Qiong Wu , Zhiying Feng , Zhi Zhou , Deke Guo , Xu Chen

Large Language Models (LLMs) exhibit remarkable human-like predictive capabilities. However, it is challenging to deploy LLMs to provide efficient and adaptive inference services at the edge. This paper proposes a novel Cloud-Edge…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-10 Hongpeng Jin , Yanzhao Wu

High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks…

Machine Learning · Computer Science 2023-09-13 Woosuk Kwon , Zhuohan Li , Siyuan Zhuang , Ying Sheng , Lianmin Zheng , Cody Hao Yu , Joseph E. Gonzalez , Hao Zhang , Ion Stoica

Large Language Models (LLMs) have shown strong capabilities in language understanding and reasoning across diverse domains. Recently, there has been increasing interest in utilizing LLMs not merely as assistants in optimization tasks, but…

Neural and Evolutionary Computing · Computer Science 2025-10-10 Jie Zhao , Tao Wen , Kang Hao Cheong

Vision Large Language Models (VLMs) combine visual understanding with natural language processing, enabling tasks like image captioning, visual question answering, and video analysis. While VLMs show impressive capabilities across domains…

Computer Vision and Pattern Recognition · Computer Science 2025-06-18 Ahmed Sharshar , Latif U. Khan , Waseem Ullah , Mohsen Guizani

Large Language Models (LLMs) have revolutionized a wide range of domains such as natural language processing, computer vision, and multi-modal tasks due to their ability to comprehend context and perform logical reasoning. However, the…

Artificial Intelligence · Computer Science 2025-07-31 Haoyang Li , Yiming Li , Anxin Tian , Tianhao Tang , Zhanchao Xu , Xuejia Chen , Nicole Hu , Wei Dong , Qing Li , Lei Chen

On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest since they are more cost-effective, latency-efficient, and privacy-preserving compared with the cloud paradigm.…

Networking and Internet Architecture · Computer Science 2025-03-21 Guanqiao Qu , Qiyuan Chen , Wei Wei , Zheng Lin , Xianhao Chen , Kaibin Huang

Large Language Models (LLMs) enable various applications on edge devices such as smartphones, wearables, and embodied robots. However, their deployment often depends on expensive cloud-based APIs, creating high operational costs, which…

Robotics · Computer Science 2025-05-29 Yeshwanth Venkatesha , Souvik Kundu , Priyadarshini Panda

The growing adoption of Large Language Models (LLMs) across various domains has driven the demand for efficient and scalable AI-serving solutions. Deploying LLMs requires optimizations to manage their significant computational and data…

Hardware Architecture · Computer Science 2025-03-07 Junsoo Kim , Hunjong Lee , Geonwoo Ko , Gyubin Choi , Seri Ham , Seongmin Hong , Joo-Young Kim

Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. This surge in demand poses significant challenges in optimizing throughput and latency while keeping costs manageable. The Key-Value…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-23 Jiale Xu , Rui Zhang , Cong Guo , Weiming Hu , Zihan Liu , Feiyang Wu , Yu Feng , Shixuan Sun , Changxu Shao , Yuhong Guo , Junping Zhao , Ke Zhang , Minyi Guo , Jingwen Leng

Large Language Models (LLMs) in agentic workflows combine multi-step reasoning, heterogeneous tool use, and collaboration across multiple specialized agents. Existing LLM serving engines optimize individual calls in isolation, while…

Databases · Computer Science 2026-01-21 Junyi Shen , Noppanat Wadlom , Yao Lu

Distributed prefix caching has become a core technique for efficient LLM serving. However, for long-context requests with high cache hit ratios, retrieving reusable KVCache blocks from remote servers has emerged as a new performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-24 Weiye Wang , Chen Chen , Junxue Zhang , Zhusheng Wang , Hui Yuan , Zixuan Guan , Xiaolong Zheng , Qizhen Weng , Yin Chen , Minyi Guo

Query optimization, which finds the optimized execution plan for a given query, is a complex planning and decision-making problem within the exponentially growing plan space in database management systems (DBMS). Traditional optimizers…

Databases · Computer Science 2025-02-11 Jie Tan , Kangfei Zhao , Rui Li , Jeffrey Xu Yu , Chengzhi Piao , Hong Cheng , Helen Meng , Deli Zhao , Yu Rong
‹ Prev 1 2 3 10 Next ›