English

A Unifying Framework to Enable Artificial Intelligence in High Performance Computing Workflows

Distributed, Parallel, and Cluster Computing 2025-05-06 v1 Software Engineering

Abstract

Current trends point to a future where large-scale scientific applications are tightly-coupled HPC/AI hybrids. Hence, we urgently need to invest in creating a seamless, scalable framework where HPC and AI/ML can efficiently work together and adapt to novel hardware and vendor libraries without starting from scratch every few years. The current ecosystem and sparsely-connected community are not sufficient to tackle these challenges, and we require a breakthrough catalyst for science similar to what PyTorch enabled for AI.

Keywords

Cite

@article{arxiv.2505.02738,
  title  = {A Unifying Framework to Enable Artificial Intelligence in High Performance Computing Workflows},
  author = {Jens Domke and Mohamed Wahib and Anshu Dubey and Tal Ben-Nun and Erik W. Draeger},
  journal= {arXiv preprint arXiv:2505.02738},
  year   = {2025}
}

Comments

article is still in press; DOI was already assgined by publisher; publication will appear in Computing in Science & Engineering (CiSE) https://www.computer.org/csdl/magazine/cs