English
Related papers

Related papers: Optimizing Prediction Serving on Low-Latency Serve…

200 papers

Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior researches adopt the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-01 Zijun Li , Chuhao Xu , Quan Chen , Jieru Zhao , Chen Chen , Minyi Guo

Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The…

Networking and Internet Architecture · Computer Science 2024-10-25 Shinan Liu , Ted Shaowang , Gerry Wan , Jeewon Chae , Jonatas Marques , Sanjay Krishnan , Nick Feamster

In recent years Serverless Computing has emerged as a compelling cloud based model for the development of a wide range of data-intensive applications. However, rapid container provisioning introduces non-trivial challenges for FaaS cloud…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Dimitrios Tomaras , Michail Tsenos , Vana Kalogeraki

Large Language Models (LLMs) have revolutionized numerous domains, driving the rise of Language-Model-as-a-Service (LMaaS) platforms that process millions of queries daily. These platforms must minimize latency and meet Service Level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Zhihan Jiang , Yujie Huang , Guangba Yu , Junjie Huang , Jiazhen Gu , Michael R. Lyu

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-01 Daniel Crankshaw , Xin Wang , Giulio Zhou , Michael J. Franklin , Joseph E. Gonzalez , Ion Stoica

Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training time, prediction serving has different requirements such as low latency, high…

We are witnessing an increasing trend towardsusing Machine Learning (ML) based prediction systems, span-ning across different application domains, including productrecommendation systems, personal assistant devices, facialrecognition, etc.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-24 Jashwant Raj Gunasekaran , Prashanth Thinakaran , Cyan Subhra Mishra , Mahmut Taylan Kandemir , Chita R. Das

Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Lucia Schuler , Somaya Jamil , Niklas Kühl

Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-11 Lauro Beltrão Costa , Abmar Barros , Samer Al-Kiswany , Hao Yang , Emalayan Vairavanathan , Matei Ripeanu

This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs). By harnessing the substantial near-GPU storage and memory capacities of inference servers,…

Machine Learning · Computer Science 2024-07-26 Yao Fu , Leyang Xue , Yeqi Huang , Andrei-Octavian Brabete , Dmitrii Ustiugov , Yuvraj Patel , Luo Mai

Serverless computing has transformed cloud application deployment by introducing a fine-grained, event-driven execution model that abstracts away infrastructure management. Its on-demand nature makes it especially appealing for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-14 Chanh Nguyen , Monowar Bhuyan , Erik Elmroth

Serverless computing eliminates infrastructure management overhead but introduces significant challenges regarding cold start latency and resource utilization. Traditional static resource allocation often leads to inefficiencies under…

Artificial Intelligence · Computer Science 2026-04-08 Zeyu Wang , Cuiqianhe Du , Renyue Zhang , Kejian Tong , Qi He , Qiyuan Tian

Computing servers have played a key role in developing and processing emerging compute-intensive applications in recent years. Consolidating multiple virtual machines (VMs) inside one server to run various applications introduces severe…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-29 Darong Huang , Luis Costero , Ali Pahlevan , Marina Zapater , David Atienza

Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-23 Kamil Kojs

Serverless computing has gained popularity in edge computing due to its flexible features, including the pay-per-use pricing model, auto-scaling capabilities, and multi-tenancy support. Complex Serverless-based applications typically rely…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-01 Ke Luo , Tao Ouyang , Zhi Zhou , Xu Chen

The recent advances in LLMs bring a strong demand for efficient system support to improve overall serving efficiency. As LLM inference scales towards multiple GPUs and even multiple compute nodes, various coordination patterns, such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-18 Hongyi Jin , Ruihang Lai , Charlie F. Ruan , Yingcheng Wang , Todd C. Mowry , Xupeng Miao , Zhihao Jia , Tianqi Chen

The rapid expansion of AI inference services in the cloud necessitates a robust scalability solution to manage dynamic workloads and maintain high performance. This study proposes a comprehensive scalability optimization framework for cloud…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-23 Yihong Jin , Ze Yang

This review report discusses the cold start latency in serverless inference and existing solutions. It particularly reviews the ServerlessLLM method, a system designed to address the cold start problem in serverless inference for large…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-26 Himel Ghosh

Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-23 Anupama Mampage , Shanika Karunasekera , Rajkumar Buyya

In machine learning (ML), the inference phase is the process of applying pre-trained models to new, unseen data with the objective of making predictions. During the inference phase, end-users interact with ML services to gain insights,…

Machine Learning · Computer Science 2024-11-18 Pasquale De Rosa , Yérom-David Bromberg , Pascal Felber , Djob Mvondo , Valerio Schiavoni
‹ Prev 1 2 3 10 Next ›