English

Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data

Databases 2025-02-17 v2

Abstract

Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.

Keywords

Cite

@article{arxiv.2307.02833,
  title  = {Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data},
  author = {Zahra Sadeghibogar and Alessandro Berti and Marco Pegoraro and Wil M. P. van der Aalst},
  journal= {arXiv preprint arXiv:2307.02833},
  year   = {2025}
}

Comments

12 pages, 8 figures, 3 tables, 9 references