Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data

Zahra Sadeghibogar; Alessandro Berti; Marco Pegoraro; Wil M. P. van der Aalst

Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data

Databases 2025-02-17 v2

Authors: Zahra Sadeghibogar , Alessandro Berti , Marco Pegoraro , Wil M. P. van der Aalst

Abstract

Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.

Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data

Abstract

Keywords

Cite

Comments

Related papers