Related papers: Specifying and Testing GPU Workgroup Progress Mode…

GPU-Acceleration of Parallel Unconditionally Stable Group Explicit Finite Difference Method

Graphics Processing Units (GPUs) are high performance co-processors originally intended to improve the use and quality of computer graphics applications. Once, researchers and practitioners noticed the potential of using GPU for general…

Numerical Analysis · Computer Science 2016-07-12 K. Parand , Saeed Zafarvahedian , Sayyed A. Hossayni

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-07 An Zou , Jing Li , Christopher D. Gill , Xuan Zhang

A Variant of Concurrent Constraint Programming on GPU

The number of cores on graphical computing units (GPUs) is reaching thousands nowadays, whereas the clock speed of processors stagnates. Unfortunately, constraint programming solvers do not take advantage yet of GPU parallelism. One reason…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-26 Pierre Talbot , Frédéric Pinel , Pascal Bouvry

Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

Training Large Language Models(LLMs) is one of the most compute-intensive tasks in high-performance computing. Predicting end-to-end training time for multi-billion parameter models distributed across hundreds of GPUs remains challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Biyao Zhang , Mingkai Zheng , Debargha Ganguly , Xuecen Zhang , Vikash Singh , Vipin Chaudhary , Zhao Zhang

Can Large Language Models Predict Parallel Code Performance?

Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs. This paper explores whether Large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-08 Gregory Bolet , Giorgis Georgakoudis , Harshitha Menon , Konstantinos Parasyris , Niranjan Hasabnis , Hayden Estes , Kirk W. Cameron , Gal Oren

Exploring the Limits of GPUs With Parallel Graph Algorithms

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-25 Frank Dehne , Kumanan Yogaratnam

GPU-Accelerated Verification of Machine Learning Models for Power Systems

Computational tools for rigorously verifying the performance of large-scale machine learning (ML) models have progressed significantly in recent years. The most successful solvers employ highly specialized, GPU-accelerated branch and bound…

Machine Learning · Computer Science 2023-09-11 Samuel Chevalier , Ilgiz Murzakhanov , Spyros Chatzivasileiadis

A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels

Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-18 Saeed Taheri , Apan Qasem , Martin Burtscher

Safe Large-Scale Robust Nonlinear MPC in Milliseconds via Reachability-Constrained System Level Synthesis on the GPU

We present GPU-SLS, a GPU-parallelized framework for safe, robust nonlinear model predictive control (MPC) that scales to high-dimensional uncertain robotic systems and long planning horizons. Our method jointly optimizes an…

Robotics · Computer Science 2026-04-10 Jeffrey Fang , Glen Chou

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance,…

Performance · Computer Science 2020-06-22 James D. Stevens , Andreas Klöckner

On the performance of various parallel GMRES implementations on CPU and GPU clusters

As the need for computational power and efficiency rises, parallel systems become increasingly popular among various scientific fields. While multiple core-based architectures have been the center of attention for many years, the rapid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-11 E. I. Ioannidis , N. Cheimarios , A. N. Spyropoulos , A. G. Boudouvis

A Programming Model for GPU Load Balancing

We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Muhammad Osama , Serban D. Porumbescu , John D. Owens

Understanding GPU Resource Interference One Level Deeper

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer

Expression Acceleration: Seamless Parallelization of Typed High-Level Languages

Efficient parallelization of algorithms on general-purpose GPUs is essential in many areas today. However, it is a non-trivial task for software engineers to utilize GPUs to improve the performance of high-level programs in general.…

Programming Languages · Computer Science 2024-07-09 Lars Hummelgren , John Wikman , Oscar Eriksson , Philipp Haller , David Broman

GPU Load Balancing

Fine-grained workload and resource balancing is the key to high performance for regular and irregular computations on the GPUs. In this dissertation, we conduct an extensive survey of existing load-balancing techniques to build an…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-20 Muhammad Osama

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

Modern GPU systems are constantly evolving to meet the needs of computing-intensive applications in scientific and machine learning domains. However, there is typically a gap between the hardware capacity and the achievable application…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-02 Gabin Schieffer , Ruimin Shi , Stefano Markidis , Andreas Herten , Jennifer Faj , Ivy Peng

GTaP: A GPU-Resident Fork-Join Task-Parallel Runtime with a Pragma-Based Interface

Graphics Processing Units (GPUs) excel at regular data-parallel workloads where massive hardware parallelism can be readily exploited. In contrast, many important irregular applications are naturally expressed as task parallelism with a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-08 Yuki Maeda , Kenjiro Taura

Contract-Based General-Purpose GPU Programming

Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-14 Alexey Kolesnichenko , Christopher M. Poskitt , Sebastian Nanz , Bertrand Meyer

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification overheads. Our approach co-locates integrity…

Cryptography and Security · Computer Science 2025-10-29 Marcin Spoczynski , Marcela S. Melara