Related papers: A Multi-Objective Framework for Optimizing GPU-Ena…

An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based Clouds

The explosive growth of AI applications has created unprecedented demand for GPU resources. Cloud providers meet this demand through GPU-as-a-Service platforms that offer rentable GPU resources for running AI workloads. In this context, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-25 Marco Zambianco , Lorenzo Fasol , Roberto Doriguzzi-Corin

An Online Fragmentation-Aware Scheduler for Managing GPU-Sharing Workloads on Multi-Instance GPUs

Modern GPU workloads increasingly demand efficient resource sharing, as many jobs do not require the full capacity of a GPU. Among sharing techniques, NVIDIA's Multi-Instance GPU (MIG) offers strong resource isolation by enabling…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-19 Hsu-Tzu Ting , Jerry Chou , Ming-Hung Chen , I-Hsin Chung

Optimal Workload Placement on Multi-Instance GPUs

There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources. To help with this goal, several of the current generation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-11 Bekir Turkkan , Pavankumar Murali , Pavithra Harsha , Rohan Arora , Gerard Vanloo , Chandra Narayanaswami

Flex-MIG: Enabling Distributed Execution on MIG

GPU clusters in multi-tenant settings often suffer from underutilization, making GPU-sharing technologies essential for efficient resource use. Among them, NVIDIA Multi-Instance GPU (MIG) has gained traction for providing hardware-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-14 Myeongsu Kim , Ikjun Yeom , Younghoon Kim

On the Partitioning of GPU Power among Multi-Instances

Efficient power management in cloud data centers is essential for reducing costs, enhancing performance, and minimizing environmental impact. GPUs, critical for tasks like machine learning (ML) and GenAI, are major contributors to power…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-15 Tirth Vamja , Kaustabha Ray , Felix George , UmaMaheswari C Devi

An Analysis of Collocation on GPUs for Deep Learning Training

Deep learning training is an expensive process that extensively uses GPUs, but not all model training saturates modern powerful GPUs. Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better-fit…

Machine Learning · Computer Science 2023-04-25 Ties Robroek , Ehsan Yousefzadeh-Asl-Miandoab , Pınar Tözün

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading

Advances in GPU compute throughput and memory capacity brings significant opportunities to a wide range of workloads. However, efficiently utilizing these resources remains challenging, particularly because diverse application…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Gabin Schieffer , Ruimin Shi , Jie Ren , Ivy Peng

MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant Systems for Machine Learning

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC and AI/ML researchers to advance the scientific discovery process. However, this also leads to inefficient resource usage, as most GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-10 Baolin Li , Tirthak Patel , Siddarth Samsi , Vijay Gadepally , Devesh Tiwari

Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design

NVIDIA's Multi-Instance GPU (MIG) technology enables partitioning GPU computing power and memory into separate hardware instances, providing complete isolation including compute resources, caches, and memory. However, prior work identifies…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-30 Bingyao Li , Yueqi Wang , Tianyu Wang , Lieven Eeckhout , Jun Yang , Aamer Jaleel , Xulong Tang

Leveraging Multi-Instance GPUs through moldable task scheduling

NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-21 Jorge Villarrubia , Luis Costero , Francisco D. Igual , Katzalin Olcoz

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem

Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs that partitions one physical GPU into multiple GPU instances. With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs). However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-24 Cheng Tan , Zhichao Li , Jian Zhang , Yu Cao , Sikai Qi , Zherui Liu , Yibo Zhu , Chuanxiong Guo

GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems

The proliferation of GPU-accelerated workloads, particularly in artificial intelligence and large language model (LLM) inference, has created unprecedented demand for efficient GPU resource sharing in cloud and container environments. While…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Jithin VG , Ditto PS

Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration

Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-19 Tianyu Wang , Sheng Li , Bingyao Li , Yue Dai , Ao Li , Geng Yuan , Yufei Ding , Youtao Zhang , Xulong Tang

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

In cloud machine learning (ML) inference systems, providing low latency to end-users is of utmost importance. However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-01 Yunseong Kim , Yujeong Choi , Minsoo Rhu

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single- Program Multiple-Data (SPMD)…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-25 Teng Li , Vikram K. Narayana , Tarek El-Ghazawi

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-08 Eishi Arima , Minjoon Kang , Issa Saba , Josef Weidendorfer , Carsten Trinitis , Martin Schulz

ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments

In cloud environments, GPU-based deep neural network (DNN) inference servers are required to meet the Service Level Objective (SLO) latency for each workload under a specified request rate, while also minimizing GPU resource consumption.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-24 Munkyu Lee , Sihoon Seong , Minki Kang , Jihyuk Lee , Gap-Joo Na , In-Geol Chun , Dimitrios Nikolopoulos , Cheol-Ho Hong

Understanding GPU Resource Interference One Level Deeper

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

A Secure and Multi-objective Virtual Machine Placement Framework for Cloud Data Centre

To facilitate cost-effective and elastic computing benefits to the cloud users, the energy-efficient and secure allocation of virtual machines (VMs) plays a significant role at the data centre. The inefficient VM Placement (VMP) and sharing…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-29 Deepika Saxena , Ishu Gupta , Jitendra Kumar , Ashutosh Kumar Singh , Xiaoqing Wen

Optimal Placement Algorithms for Virtual Machines

Cloud computing provides a computing platform for the users to meet their demands in an efficient, cost-effective way. Virtualization technologies are used in the clouds to aid the efficient usage of hardware. Virtual machines (VMs) are…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-11-24 Umesh Bellur , Chetan S Rao , Madhu Kumar SD