Related papers: GPU-enabled Function-as-a-Service for Machine Lear…

A Data as a Service (DaaS) Model for GPU-based Data Analytics

Cloud-based services with resources to be provisioned for consumers are increasingly the norm, especially with respect to Big data, spatiotemporal data mining and application services that impose a user's agreed Quality of Service (QoS)…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-07 John Olorunfemi Abe , Burak Berk Ustundaug

Kernel-as-a-Service: A Serverless Interface to GPUs

Serverless computing has made it easier than ever to deploy applications over scalable cloud resources, all the while driving higher utilization for cloud providers. While this technique has worked well for easily divisible resources like…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-19 Nathan Pemberton , Anton Zabreyko , Zhoujie Ding , Randy Katz , Joseph Gonzalez

S-FaaS: Trustworthy and Accountable Function-as-a-Service using Intel SGX

Function-as-a-Service (FaaS) is a recent and already very popular paradigm in cloud computing. The function provider need only specify the function to be run, usually in a high-level language like JavaScript, and the service provider…

Cryptography and Security · Computer Science 2020-05-25 Fritz Alder , N. Asokan , Arseny Kurnikov , Andrew Paverd , Michael Steiner

Function-as-a-Service Performance Evaluation: A Multivocal Literature Review

Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate…

Performance · Computer Science 2020-07-21 Joel Scheuner , Philipp Leitner

Scheduling Methods to Reduce Response Latency of Function as a Service

Function as a Service (FaaS) permits cloud customers to deploy to cloud individual functions, in contrast to complete virtual machines or Linux containers. All major cloud providers offer FaaS products (Amazon Lambda, Google Cloud…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-12 Pawel Zuk , Krzysztof Rzadca

Privacy preserving Neural Network Inference on Encrypted Data with GPUs

Machine Learning as a Service (MLaaS) has become a growing trend in recent years and several such services are currently offered. MLaaS is essentially a set of services that provides machine learning tools and capabilities as part of cloud…

Cryptography and Security · Computer Science 2019-11-27 Daniel Takabi , Robert Podschwadt , Jeff Druce , Curt Wu , Kevin Procopio

FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference

Serverless computing (FaaS) has been extensively utilized for deep learning (DL) inference due to the ease of deployment and pay-per-use benefits. However, existing FaaS platforms utilize GPUs in a coarse manner for DL inferences, without…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-04 Jianfeng Gu , Yichao Zhu , Puxuan Wang , Mohak Chadha , Michael Gerndt

MQFQ-Sticky: Fair Queueing For Serverless GPU Functions

Hardware accelerators like GPUs are now ubiquitous in data centers, but are not fully supported by common cloud abstractions such as Functions as a Service (FaaS). Many popular and emerging FaaS applications such as machine learning and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-15 Alexander Fuerst , Siddharth Anil , Vishakha Dixit , Purushottam , Kulkarni , Prateek Sharma

Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning

As machine learning techniques are applied to a widening range of applications, high throughput machine learning (ML) inference servers have become critical for online service applications. Such ML inference servers pose two challenges:…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-06 Seungbeom Choi , Sunho Lee , Yeonjae Kim , Jongse Park , Youngjin Kwon , Jaehyuk Huh

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

In cloud machine learning (ML) inference systems, providing low latency to end-users is of utmost importance. However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-01 Yunseong Kim , Yujeong Choi , Minsoo Rhu

gFaaS: Enabling Generic Functions in Serverless Computing

With the advent of AWS Lambda in 2014, Serverless Computing, particularly Function-as-a-Service (FaaS), has witnessed growing popularity across various application domains. FaaS enables an application to be decomposed into fine-grained…

Software Engineering · Computer Science 2024-01-22 Mohak Chadha , Paul Wieland , Michael Gerndt

Multi-Event Triggers for Serverless Computing

Function-as-a-Service (FaaS) is an event-driven serverless cloud computing model in which small, stateless functions are invoked in response to events, such as HTTP requests, new database entries, or messages. Current FaaS platform assume…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-14 Natalie Carl , Trever Schirmer , Niklas Kowallik , Joshua Adamek , Tobias Pfandzelter , Sergio Lucia , David Bermbach

Data-driven scheduling in serverless computing to reduce response time

In Function as a Service (FaaS), a serverless computing variant, customers deploy functions instead of complete virtual machines or Linux containers. It is the cloud provider who maintains the runtime environment for these functions. FaaS…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-16 Bartłomiej Przybylski , Paweł Żuk , Krzysztof Rzadca

AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

The rise of Large Language Models (LLM) has increased the need for scalable, high-performance inference systems, yet most existing frameworks assume homogeneous, resource-rich hardware, often unrealistic in academic, or resource-constrained…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Pedro Antunes , Ana Rita Ortigoso , Gabriel Vieira , Daniel Fuentes , Luís Frazão , Nuno Costa , António Pereira

SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing

Function-as-a-Service (FaaS) is one of the most promising directions for the future of cloud services, and serverless functions have immediately become a new middleware for building scalable and cost-efficient microservices and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-05 Marcin Copik , Grzegorz Kwasniewski , Maciej Besta , Michal Podstawski , Torsten Hoefler

HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences

Serverless Computing (FaaS) has become a popular paradigm for deep learning inference due to the ease of deployment and pay-per-use benefits. However, current serverless inference platforms encounter the coarse-grained and static GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-03 Jianfeng Gu , Puxuan Wang , Isaac David Nunez Araya , Kai Huang , Michael Gerndt

Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions

Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this…

Cryptography and Security · Computer Science 2024-04-17 Abdulrahman Diaa , Lucas Fenaux , Thomas Humphries , Marian Dietz , Faezeh Ebrahimianghazani , Bailey Kacsmar , Xinda Li , Nils Lukas , Rasoul Akhavan Mahdavi , Simon Oya , Ehsan Amjadian , Florian Kerschbaum

FPGAs-as-a-Service Toolkit (FaaST)

Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant…

Computational Physics · Physics 2021-11-05 Dylan Sheldon Rankin , Jeffrey Krupa , Philip Harris , Maria Acosta Flechas , Burt Holzman , Thomas Klijnsma , Kevin Pedro , Nhan Tran , Scott Hauck , Shih-Chieh Hsu , Matthew Trahms , Kelvin Lin , Yu Lou , Ta-Wei Ho , Javier Duarte , Mia Liu

MLLess: Achieving Cost Efficiency in Serverless Machine Learning Training

Function-as-a-Service (FaaS) has raised a growing interest in how to "tame" serverless computing to enable domain-specific use cases such as data-intensive applications and machine learning (ML), to name a few. Recently, several systems…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Pablo Gimeno Sarroca , Marc Sánchez-Artigas

rFaaS: Enabling High Performance Serverless with RDMA and Leases

High performance is needed in many computing systems, from batch-managed supercomputers to general-purpose cloud platforms. However, scientific clusters lack elastic parallelism, while clouds cannot offer competitive costs for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-16 Marcin Copik , Konstantin Taranov , Alexandru Calotoiu , Torsten Hoefler