English

The BioExcel methodology for developing dynamic, scalable, reliable and portable computational biomolecular workflows

Distributed, Parallel, and Cluster Computing 2022-12-20 v1

Abstract

Developing complex biomolecular workflows is not always straightforward. It requires tedious developments to enable the interoperability between the different biomolecular simulation and analysis tools. Moreover, the need to execute the pipelines on distributed systems increases the complexity of these developments. To address these issues, we propose a methodology to simplify the implementation of these workflows on HPC infrastructures. It combines a library, the BioExcel Building Blocks (BioBBs), that allows scientists to implement biomolecular pipelines as Python scripts, and the PyCOMPSs programming framework which allows to easily convert Python scripts into task-based parallel workflows executed in distributed computing systems such as HPC clusters, clouds, containerized platforms, etc. Using this methodology, we have implemented a set of computational molecular workflows and we have performed several experiments to validate its portability, scalability, reliability and malleability.

Keywords

Cite

@article{arxiv.2208.14130,
  title  = {The BioExcel methodology for developing dynamic, scalable, reliable and portable computational biomolecular workflows},
  author = {Jorge Ejarque and Pau Andrio and Adam Hospital and Javier Conejero and Daniele Lezzi and Josep LL. Gelpi and Rosa M. Badia},
  journal= {arXiv preprint arXiv:2208.14130},
  year   = {2022}
}

Comments

Accepted in IEEE eScience conference 2022