English

Taking GPU Programming Models to Task for Performance Portability

Distributed, Parallel, and Cluster Computing 2025-09-08 v4 Performance

Abstract

Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU systems, they don't make any guarantees about performance portability. In this work, we explore several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL, to assess the consistency of their performance across NVIDIA and AMD GPUs. We use five proxy applications from different scientific domains, create implementations where missing, and use them to present a comprehensive comparative evaluation of the performance portability of these programming models. We provide a Spack scripting-based methodology to ensure reproducibility of experiments conducted in this work. Finally, we analyze the reasons for why some programming models underperform in certain scenarios and in some cases, present performance optimizations to the proxy applications.

Keywords

Cite

@article{arxiv.2402.08950,
  title  = {Taking GPU Programming Models to Task for Performance Portability},
  author = {Joshua H. Davis and Pranav Sivaraman and Joy Kitson and Konstantinos Parasyris and Harshitha Menon and Isaac Minn and Giorgis Georgakoudis and Abhinav Bhatele},
  journal= {arXiv preprint arXiv:2402.08950},
  year   = {2025}
}

Comments

16 pages, 5 figures