English
Related papers

Related papers: Understanding GPU Resource Interference One Level …

200 papers

In order to satisfy timing constraints, modern real-time applications require massively parallel accelerators such as General Purpose Graphic Processing Units (GPGPUs). Generation after generation, the number of computing clusters made…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Houssam-Eddine Zahaf , Ignacio Sanudo Olmedo , Jayati Singh , Nicola Capodieci , Sebastien Faucou

Collocating deep learning training tasks improves GPU utilization but risks resource contention, severe slowdowns, and out-of-memory (OOM) failures. Accurate memory estimation is essential for robust collocation, and GPU utilization…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-29 Ehsan Yousefzadeh-Asl-Miandoab , Reza Karimzadeh , Danyal Yorulmaz , Bulat Ibragimov , Pınar Tözün

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Chao Chen , Chris Porter , Santosh Pande

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-14 Carlos Reano , Federico Silla , Dimitrios S. Nikolopoulos , Blesson Varghese

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g.\ OpenCL) do not mandate fair scheduling, and…

Programming Languages · Computer Science 2017-07-10 Tyler Sorensen , Hugues Evrard , Alastair F. Donaldson

Machine learning (ML) inference serving systems can schedule requests to improve GPU utilization and to meet service level objectives (SLOs) or deadlines. However, improving GPU utilization may compromise latency-sensitive scheduling, as…

Machine Learning · Computer Science 2025-12-25 Haidong Zhao , Nikolaos Georgantas

In recent years, machine intelligence (MI) applications have emerged as a major driver for the computing industry. Optimizing these workloads is important but complicated. As memory demands grow and data movement overheads increasingly…

Efficient scheduling of distributed deep learning (DL) jobs in large GPU clusters is crucial for resource efficiency and job performance. While server sharing among jobs improves resource utilization, interference among co-located DL jobs…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-28 Xiaoyang Zhao , Chuan Wu

There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources. To help with this goal, several of the current generation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-11 Bekir Turkkan , Pavankumar Murali , Pavithra Harsha , Rohan Arora , Gerard Vanloo , Chandra Narayanaswami

Large-scale machine learning workloads increasingly rely on multi-GPU systems, yet their performance is often limited by an overlooked component: the CPU. Through a detailed study of modern large language model (LLM) inference and serving…

Hardware Architecture · Computer Science 2026-05-26 Euijun Chung , Yuxiao Jia , Aaron Jezghani , Hyesoon Kim

Recent advancements in Large Language Models (LLMs) have led to increasingly diverse requests, accompanied with varying resource (compute and memory) demands to serve them. However, this in turn degrades the cost-efficiency of LLM serving…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-06 Youhe Jiang , Fangcheng Fu , Xiaozhe Yao , Guoliang He , Xupeng Miao , Ana Klimovic , Bin Cui , Binhang Yuan , Eiko Yoneki

GPUs are essential to accelerating the latency-sensitive deep neural network (DNN) inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of GPUs among co-located DNN inference workloads becomes…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-04 Fei Xu , Jianian Xu , Jiabin Chen , Li Chen , Ruitao Shang , Zhi Zhou , Fangming Liu

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-02 Wei Gao , Qinghao Hu , Zhisheng Ye , Peng Sun , Xiaolin Wang , Yingwei Luo , Tianwei Zhang , Yonggang Wen

This article features extended summaries and retrospectives of some of the recent research done by our research group, SAFARI, on (1) various critical problems in memory systems and (2) how memory system bottlenecks affect graphics…

Hardware Architecture · Computer Science 2018-05-30 Onur Mutlu , Saugata Ghose , Rachata Ausavarungnirun

Efficient LLM inference on resource-constrained devices presents significant challenges in compute and memory utilization. Due to limited GPU memory, existing systems offload model weights to CPU memory, incurring substantial I/O overhead…

Machine Learning · Computer Science 2025-05-22 Xiangwen Zhuge , Xu Shen , Zeyu Wang , Fan Dang , Xuan Ding , Danyang Li , Yahui Han , Tianxiang Hao , Zheng Yang

Contemporary GPUs allow concurrent execution of small computational kernels in order to prevent idling of GPU resources. Despite the potential concurrency between independent kernels, the order in which kernels are issued to the GPU will…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-26 Teng Li , Vikram K. Narayana , Tarek El-Ghazawi

GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet…

Operating Systems · Computer Science 2025-08-13 Jiarong Xing , Yifan Qiao , Simon Mo , Xingqi Cui , Gur-Eyal Sela , Yang Zhou , Joseph Gonzalez , Ion Stoica

Modern GPU workloads increasingly demand efficient resource sharing, as many jobs do not require the full capacity of a GPU. Among sharing techniques, NVIDIA's Multi-Instance GPU (MIG) offers strong resource isolation by enabling…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-19 Hsu-Tzu Ting , Jerry Chou , Ming-Hung Chen , I-Hsin Chung

Advances in GPU compute throughput and memory capacity brings significant opportunities to a wide range of workloads. However, efficiently utilizing these resources remains challenging, particularly because diverse application…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Gabin Schieffer , Ruimin Shi , Jie Ren , Ivy Peng
‹ Prev 1 2 3 10 Next ›