Related papers: Intra-node Memory Safe GPU Co-Scheduling

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-07 An Zou , Jing Li , Christopher D. Gill , Xuan Zhang

Effective GPU Sharing Under Compiler Guidance

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Chao Chen , Chris Porter , Santosh Pande

Techniques for Shared Resource Management in Systems with Throughput Processors

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Understanding GPU Resource Interference One Level Deeper

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

Power Consumption Analysis of Parallel Algorithms on GPUs

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs'…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-05 Frédéric Magoulès , Abal-Kassim Cheik Ahamed , Alban Desmaison , Jean-Christophe Léchenet , François Mayer , Haifa Ben Salem , Thomas Zhu

DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime

GPUs are readily available in cloud computing and personal devices, but their use for data processing acceleration has been slowed down by their limited integration with common programming languages such as Python or Java. Moreover, using…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-20 Alberto Parravicini , Arnaud Delamare , Marco Arnaboldi , Marco D. Santambrogio

Towards Energy Efficient Co-Scheduling in HPC

Modern multi GPU HPC systems expose substantial computational capacity, yet inefficient GPU allocation often leads to wasted energy and underutilization. In practice, GPU applications exhibit heterogeneous and nonlinear scaling, making it…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Zhong Zheng , Michael E. Papka , Zhiling Lan

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…

Hardware Architecture · Computer Science 2020-08-11 Saiful A. Mojumder , Yifan Sun , Leila Delshadtehrani , Yenai Ma , Trinayan Baruah , José L. Abellán , John Kim , David Kaeli , Ajay Joshi

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-27 Hao Wu , Daniel Lohmann , Wolfgang Schröder-Preikschat

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

CPU-GPU heterogeneous architectures are now commonly used in a wide variety of computing systems from mobile devices to supercomputers. Maximizing the throughput for multi-programmed workloads on such systems is indispensable as one single…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-08 Issa Saba , Eishi Arima , Dai Liu , Martin Schulz

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single- Program Multiple-Data (SPMD)…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-25 Teng Li , Vikram K. Narayana , Tarek El-Ghazawi

A Server-based Approach for Predictable GPU Access with Improved Analysis

We propose a server-based approach to manage a general-purpose graphics processing unit (GPU) in a predictable and efficient manner. Our proposed approach introduces a GPU server that is a dedicated task to handle GPU requests from other…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-14 Hyoseung Kim , Pratyush Patel , Shige Wang , Ragunathan , Rajkumar

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems

When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores.…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun , Gabriel H. Loh , Lavanya Subramanian , Kevin Chang , Onur Mutlu

Enabling predictable parallelism in single-GPU systems with persistent CUDA threads

Graphics Processing Unit, or GPUs, have been successfully adopted both for graphic computation in 3D applications, and for general purpose application (GP-GPUs), thank to their tremendous performance-per-watt. Recently, there is a big…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-03 Paolo Burgio

Boosting performance of computer vision applications through embedded GPUs on the edge

Computer vision applications, especially those using augmented reality technology, are becoming quite popular in mobile devices. However, this type of application is known as presenting significant demands regarding resources. In order to…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Fabio Diniz Rossi

Towards a High-performance and Secure Memory System and Architecture for Emerging Applications

In this dissertation, we propose a memory and computing coordinated methodology to thoroughly exploit the characteristics and capabilities of the GPU-based heterogeneous system to effectively optimize applications' performance and privacy.…

Cryptography and Security · Computer Science 2022-09-07 Zhendong Wang , Yang Hu

Improving GPU Performance Through Resource Sharing

Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The number of thread blocks, and hence…

Hardware Architecture · Computer Science 2015-06-08 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

RTGPU: Real-Time Computing with Graphics Processing Units

In this work, we survey the role of GPUs in real-time systems. Originally designed for parallel graphics workloads, GPUs are now widely used in time-critical applications such as machine learning, autonomous vehicles, and robotics due to…

Hardware Architecture · Computer Science 2025-12-11 Atiyeh Gheibi-Fetrat , Amirsaeed Ahmadi-Tonekaboni , Farzam Koohi-Ronaghi , Pariya Hajipour , Sana Babayan-Vanestan , Fatemeh Fotouhi , Elahe Mortazavian-Farsani , Pouria Khajehpour-Dezfouli , Sepideh Safari , Shaahin Hessabi , Hamid Sarbazi-Azad

Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform

Cutting-edge embedded system applications, such as self-driving cars and unmanned drone software, are reliant on integrated CPU/GPU platforms for their DNNs-driven workload, such as perception and other highly parallel components. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-20 Soroush Bateni , Zhendong Wang , Yuankun Zhu , Yang Hu , Cong Liu

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-26 Xinning Hui , Yuanchao Xu , Zhishan Guo , Xipeng Shen