English

Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data

Databases 2026-04-09 v1 Materials Science Artificial Intelligence

Abstract

The reuse of atomistic simulation data is often limited by heterogeneous formats, incomplete metadata, and a lack of standardized representations of workflows and provenance. Here we present an ontology-based infrastructure for representing and integrating atomistic simulation data as a knowledge graph. The approach combines domain ontologies with a software framework that enables data capture both from existing datasets and directly from simulation workflows at the point of generation. Heterogeneous data from multiple sources are normalized into a common, ontology-aligned representation, enabling consistent querying and analysis across datasets. We demonstrate these capabilities through the integration of grain boundary data, cross-dataset analysis of material properties, and extraction of derived thermodynamic quantities from existing simulations. In addition, workflows are represented in a machine-readable form, enabling both forward provenance tracking and partial reconstruction of computational procedures. The resulting knowledge graph contains over 750,000 triples describing nearly 8,000 computational samples. This work provides a practical framework for improving the findability, interoperability, and reuse of atomistic simulation data.

Keywords

Cite

@article{arxiv.2604.06230,
  title  = {Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data},
  author = {Abril Azocar Guzman and Sarath Menon and Tilmann Hickel and Stefan Sandfeld},
  journal= {arXiv preprint arXiv:2604.06230},
  year   = {2026}
}