A Programming Model for GPU Load Balancing

Muhammad Osama; Serban D. Porumbescu; John D. Owens

doi:10.1145/3572848.3577434

A Programming Model for GPU Load Balancing

Distributed, Parallel, and Cluster Computing 2023-01-13 v1

Authors: Muhammad Osama , Serban D. Porumbescu , John D. Owens

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior to our work, the only way to unleash the GPU's potential on irregular problems has been to workload-balance through application-specific, tightly coupled load-balancing techniques. With our open-source framework for load-balancing, we hope to improve programmers' productivity when developing irregular-parallel algorithms on the GPU, and also improve the overall performance characteristics for such applications by allowing a quick path to experimentation with a variety of existing load-balancing techniques. Consequently, we also hope that by separating the concerns of load-balancing from work processing within our abstraction, managing and extending existing code to future architectures becomes easier.

Keywords

gpu computing large language model inference scheduling

Cite

@article{arxiv.2301.04792,
  title  = {A Programming Model for GPU Load Balancing},
  author = {Muhammad Osama and Serban D. Porumbescu and John D. Owens},
  journal= {arXiv preprint arXiv:2301.04792},
  year   = {2023}
}

Comments

This work previously appeared in the author's PhD dissertation, available at arXiv:2212.08964 Also published in the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP '23)

A Programming Model for GPU Load Balancing

Abstract

Keywords

Cite

Comments

Related papers