English

Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science

Software Engineering 2019-05-16 v1

Abstract

Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000's, with limited to no tool support, nowadays, several software tools, i.e., both open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis, ProcessGold, etc., exist. The commercial process mining tools provide limited support for implementing custom algorithms. Moreover, both commercial and open-source process mining tools are often only accessible through a graphical user interface, which hampers their usage in large-scale experimental settings. Initiatives such as RapidProM provide process mining support in the scientific workflow-based data science suite RapidMiner. However, these offer limited to no support for algorithmic customization. In the light of the aforementioned, in this paper, we present a novel process mining library, i.e. Process Mining for Python (PM4Py) that aims to bridge this gap, providing integration with state-of-the-art data science libraries, e.g., pandas, numpy, scipy and scikit-learn. We provide a global overview of the architecture and functionality of PM4Py, accompanied by some representative examples of its usage.

Keywords

Cite

@article{arxiv.1905.06169,
  title  = {Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science},
  author = {Alessandro Berti and Sebastiaan J. van Zelst and Wil van der Aalst},
  journal= {arXiv preprint arXiv:1905.06169},
  year   = {2019}
}