Related papers: Decoupling GPU Programming Models from Resource Ma…

Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management

The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of the existing GPU programming models. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-09 Nandita Vijaykumar , Kevin Hsieh , Gennady Pekhimenko , Samira Khan , Ashish Shrestha , Saugata Ghose , Phillip B. Gibbons , Onur Mutlu

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single- Program Multiple-Data (SPMD)…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-25 Teng Li , Vikram K. Narayana , Tarek El-Ghazawi

Prediction of Performance and Power Consumption of GPGPU Applications

Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-04 Gargi Alavani , Santonu Sarkar

Understanding GPU Resource Interference One Level Deeper

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

Effective GPU Sharing Under Compiler Guidance

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Chao Chen , Chris Porter , Santosh Pande

Solving the Resource Constrained Project Scheduling Problem Using the Parallel Tabu Search Designed for the CUDA Platform

In the paper, a parallel Tabu Search algorithm for the Resource Constrained Project Scheduling Problem is proposed. To deal with this NP-hard combinatorial problem many optimizations have been performed. For example, a resource evaluation…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-15 Libor Bukata , Premysl Sucha , Zdenek Hanzalek

VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU

Modern GPUs increasingly rely on specialized and asynchronous hardware units to deliver high performance. Yet these units are often underutilized because today's GPU software stacks still organize programming and execution around a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-06 Zijian He , Adrian Sampson , Yiying Zhang , Zhiyuan Guo

Improving GPU Performance Through Resource Sharing

Graphics Processing Units (GPUs) consisting of Streaming Multiprocessors (SMs) achieve high throughput by running a large number of threads and context switching among them to hide execution latencies. The number of thread blocks, and hence…

Hardware Architecture · Computer Science 2015-06-08 Vishwesh Jatala , Jayvant Anantpur , Amey Karkare

Techniques for Shared Resource Management in Systems with Throughput Processors

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

Advanced Programming Platform for efficient use of Data Parallel Hardware

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

GPGPU Based Parallelized Client-Server Framework for Providing High Performance Computation Support

Parallel data processing has become indispensable for processing applications involving huge data sets. This brings into focus the Graphics Processing Units (GPUs) which emphasize on many-core computing. With the advent of General Purpose…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-22 Poorna Banerjee , Amit Dave

Power Consumption Analysis of Parallel Algorithms on GPUs

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs'…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-05 Frédéric Magoulès , Abal-Kassim Cheik Ahamed , Alban Desmaison , Jean-Christophe Léchenet , François Mayer , Haifa Ben Salem , Thomas Zhu

GigaAPI for GPU Parallelization

GigaAPI is a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential. The API offers a comprehensive set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-03 M. Suvarna , O. Tehrani

Analyzing Molecular Simulations Trajectories by Utilizing CUDA on GPU Architecture

With the advent of high-performance computing techniques, the data for analysis has grown significantly. Here, graphic processing unit (GPU) based program kernels are discussed to exploit parallelism in the analysis codes specific to…

Computational Physics · Physics 2018-11-07 Gourav Shrivastav , Manish Agarwal

A Programming Model for GPU Load Balancing

We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Muhammad Osama , Serban D. Porumbescu , John D. Owens

Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform

Cutting-edge embedded system applications, such as self-driving cars and unmanned drone software, are reliant on integrated CPU/GPU platforms for their DNNs-driven workload, such as perception and other highly parallel components. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-20 Soroush Bateni , Zhendong Wang , Yuankun Zhu , Yang Hu , Cong Liu

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads

In order to satisfy timing constraints, modern real-time applications require massively parallel accelerators such as General Purpose Graphic Processing Units (GPGPUs). Generation after generation, the number of computing clusters made…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Houssam-Eddine Zahaf , Ignacio Sanudo Olmedo , Jayati Singh , Nicola Capodieci , Sebastien Faucou

GPUs as Storage System Accelerators

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Samer Al-Kiswany , Abdullah Gharaibeh , Matei Ripeanu

LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs

The recent progress made in large language models (LLMs) has brought tremendous application prospects to the world. The growing model size demands LLM training on multiple GPUs, while data parallelism is the most popular distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-04 Mo Sun , Zihan Yang , Changyue Liao , Yingtao Li , Fei Wu , Zeke Wang