Related papers: Leveraging Multi-Instance GPUs through moldable ta…

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem

Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs that partitions one physical GPU into multiple GPU instances. With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs). However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-24 Cheng Tan , Zhichao Li , Jian Zhang , Yu Cao , Sikai Qi , Zherui Liu , Yibo Zhu , Chuanxiong Guo

An Online Fragmentation-Aware Scheduler for Managing GPU-Sharing Workloads on Multi-Instance GPUs

Modern GPU workloads increasingly demand efficient resource sharing, as many jobs do not require the full capacity of a GPU. Among sharing techniques, NVIDIA's Multi-Instance GPU (MIG) offers strong resource isolation by enabling…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-19 Hsu-Tzu Ting , Jerry Chou , Ming-Hung Chen , I-Hsin Chung

An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based Clouds

The explosive growth of AI applications has created unprecedented demand for GPU resources. Cloud providers meet this demand through GPU-as-a-Service platforms that offer rentable GPU resources for running AI workloads. In this context, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-25 Marco Zambianco , Lorenzo Fasol , Roberto Doriguzzi-Corin

An Analysis of Collocation on GPUs for Deep Learning Training

Deep learning training is an expensive process that extensively uses GPUs, but not all model training saturates modern powerful GPUs. Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better-fit…

Machine Learning · Computer Science 2023-04-25 Ties Robroek , Ehsan Yousefzadeh-Asl-Miandoab , Pınar Tözün

Flex-MIG: Enabling Distributed Execution on MIG

GPU clusters in multi-tenant settings often suffer from underutilization, making GPU-sharing technologies essential for efficient resource use. Among them, NVIDIA Multi-Instance GPU (MIG) has gained traction for providing hardware-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-14 Myeongsu Kim , Ikjun Yeom , Younghoon Kim

A Multi-Objective Framework for Optimizing GPU-Enabled VM Placement in Cloud Data Centers with Multi-Instance GPU Technology

The extensive use of GPUs in cloud computing and the growing need for multitenancy have driven the development of innovative solutions for efficient GPU resource management. Multi-Instance GPU (MIG) technology from NVIDIA enables shared GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-05 Ahmad Siavashi , Mahmoud Momtazpour

On the Partitioning of GPU Power among Multi-Instances

Efficient power management in cloud data centers is essential for reducing costs, enhancing performance, and minimizing environmental impact. GPUs, critical for tasks like machine learning (ML) and GenAI, are major contributors to power…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-15 Tirth Vamja , Kaustabha Ray , Felix George , UmaMaheswari C Devi

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both…

Machine Learning · Computer Science 2023-01-03 Huaizheng Zhang , Yuanming Li , Wencong Xiao , Yizheng Huang , Xing Di , Jianxiong Yin , Simon See , Yong Luo , Chiew Tong Lau , Yang You

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading

Advances in GPU compute throughput and memory capacity brings significant opportunities to a wide range of workloads. However, efficiently utilizing these resources remains challenging, particularly because diverse application…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Gabin Schieffer , Ruimin Shi , Jie Ren , Ivy Peng

Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains. Applying a GPU is indispensable for parallel…

Operating Systems · Computer Science 2026-02-25 Yuanhai Zhang , Songyang He , Ruizhe Gou , Mingyue Cui , Boyang Li , Shuai Zhao , Kai Huang

A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies

To mitigate the increasingly common underutilization of computational resources in modern GPUs, spatial sharing methods enable multiple applications to use them simultaneously. This work presents a comprehensive evaluation of NVIDIA's…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-30 Jorge Villarrubia , Luis Costero , Francisco D. Igual , Katzalin Olcoz

MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant Systems for Machine Learning

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC and AI/ML researchers to advance the scientific discovery process. However, this also leads to inefficient resource usage, as most GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-10 Baolin Li , Tirthak Patel , Siddarth Samsi , Vijay Gadepally , Devesh Tiwari

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

In cloud machine learning (ML) inference systems, providing low latency to end-users is of utmost importance. However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-01 Yunseong Kim , Yujeong Choi , Minsoo Rhu

PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers

NVIDIA's Multi-Instance GPU (MIG) is a feature that enables system designers to reconfigure one large GPU into multiple smaller GPU slices. This work characterizes this emerging GPU and evaluates its effectiveness in designing…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-02 Gwangoo Yeo , Jiin Kim , Yujeong Choi , Minsoo Rhu

Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms

Scientific workflows are often represented as directed acyclic graphs (DAGs), where vertices correspond to tasks and edges represent the dependencies between them. Since these graphs are often large in both the number of tasks and their…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-15 Svetlana Kulagina , Henning Meyerhenke , Anne Benoit

Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design

NVIDIA's Multi-Instance GPU (MIG) technology enables partitioning GPU computing power and memory into separate hardware instances, providing complete isolation including compute resources, caches, and memory. However, prior work identifies…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-30 Bingyao Li , Yueqi Wang , Tianyu Wang , Lieven Eeckhout , Jun Yang , Aamer Jaleel , Xulong Tang

Optimization and Reoptimization in Scheduling Problems

Parallel machine scheduling has been extensively studied in the past decades, with applications ranging from production planning to job processing in large computing clusters. In this work we study some of these fundamental optimization…

Data Structures and Algorithms · Computer Science 2015-09-08 Yael Mordechai

A Cost Effective Reliability Aware Scheduler for Task Graphs in Multi-Cloud System

Many scientific workflows can be represented by a Directed Acyclic Graph (DAG) where each node represents a task, and there will be a directed edge between two tasks if and only if there is a dependency relationship between the two i.e. the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-20 Atharva Tekawade , Suman Banerjee

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-07 An Zou , Jing Li , Christopher D. Gill , Xuan Zhang

Scheduling Splittable Jobs on Configurable Machines

Motivated by deep neural network applications, we study the problem of scheduling splittable jobs (e.g., neural network inference tasks) on configurable machines (e.g., multi-instance GPUs). We are given $n$ jobs and a set $C$ of…

Data Structures and Algorithms · Computer Science 2023-12-12 Matthew Casey , Rajmohan Rajaraman , David Stalfa