Related papers: Serving deep learning models in a serverless platf…
Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these…
This review report discusses the cold start latency in serverless inference and existing solutions. It particularly reviews the ServerlessLLM method, a system designed to address the cold start problem in serverless inference for large…
This paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless…
Serverless computing offers an event driven pay-as-you-go framework for application development. A key selling point is the concept of no back-end server management, allowing developers to focus on application functionality. This is…
Data processing systems are increasingly deployed in the cloud. While monolithic systems run fully on virtual servers, recent systems embrace cloud infrastructure and utilize the disaggregation of compute and storage to scale them…
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of…
Machine learning (ML) is an important part of modern data science applications. Data scientists today have to manage the end-to-end ML life cycle that includes both model training and model serving, the latter of which is essential, as it…
Organisations are increasingly putting machine learning models into production at scale. The increasing popularity of serverless scale-to-zero paradigms presents an opportunity for deploying machine learning models to help mitigate…
The promise of ultimate elasticity and operational simplicity of serverless computing has recently lead to an explosion of research in this area. In the context of data analytics, the concept sounds appealing, but due to the limitations of…
Serverless computing has emerged as a compelling solution for cloud-based model inference. However, as modern large language models (LLMs) continue to grow in size, existing serverless platforms often face substantial model startup…
Serverless architectures organized around loosely-coupled function invocations represent an emerging design for many applications. Recent work mostly focuses on user-facing products and event-driven processing pipelines. In this paper, we…
Next generation technologies such as smart healthcare, self-driving cars, and smart cities require new approaches to deal with the network traffic generated by the Internet of Things (IoT) devices, as well as efficient programming models to…
Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the…
Serverless cloud is an innovative cloud service model that frees customers from most cloud management duties. It also offers the same advantages as other cloud models but at much lower costs. As a result, the serverless cloud has been…
Serverless computing is transforming cloud application development, but the performance-cost trade-offs of control plane designs remain poorly understood due to a lack of open, cross-platform benchmarks and detailed system analyses. In this…
In this paper, we propose DEEPSERVE, a scalable and serverless AI platform designed to efficiently serve large language models (LLMs) at scale in cloud environments. DEEPSERVE addresses key challenges such as resource allocation, serving…
Serverless computing has transformed cloud application deployment by introducing a fine-grained, event-driven execution model that abstracts away infrastructure management. Its on-demand nature makes it especially appealing for…
End-users can get functions-as-a-service from serverless platforms, which promise lower hosting costs, high availability, fault tolerance, and dynamic flexibility for hosting individual functions known as microservices. Machine learning…
Serverless computing (also known as functions as a service) is a new cloud computing abstraction that makes it easier to write robust, large-scale web services. In serverless computing, programmers write what are called serverless…
The widespread deployment of large-scale, compute-intensive applications such as high-performance computing, artificial intelligence, and big data is leading to convergence between cloud and high-performance computing infrastructures. Cloud…