English

Integrative Data Semantics through a Model-enabled Data Stewardship

Quantitative Methods 2021-11-19 v1

Abstract

Motivation: The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more comprehensive picture of disease aetiology. However, achieving this requires a global integration of data across studies, which proves to be challenging given the lack of interoperability of cohort datasets. Results: Here, we present the Data Steward Tool (DST), an application that allows for semi-automatic semantic integration of clinical data into ontologies and global data models and data standards. We demonstrate the applicability of the tool in the field of dementia research by establishing a Clinical Data Model (CDM) in this domain. The CDM currently consists of 277 common variables covering demographics (e.g. age and gender), diagnostics, neuropsychological tests, and biomarker measurements. The DST combined with this disease-specific data model shows how interoperability between multiple, heterogeneous dementia datasets can be achieved.

Keywords

Cite

@article{arxiv.2111.09313,
  title  = {Integrative Data Semantics through a Model-enabled Data Stewardship},
  author = {Philipp Wegner and Sebastian Schaaf and Mischa Uebachs and Daniel Domingo-Fernández and Yasamin Salimi and Stephan Gebel and Astghik Sargsyan and Colin Birkenbihl and Stephan Springstubbe and Thomas Klockgether and Juliane Fluck and Martin Hofmann-Apitius and Alpha Tom Kodamullil},
  journal= {arXiv preprint arXiv:2111.09313},
  year   = {2021}
}

Comments

13 pages, 5 figures, 2 tables

R2 v1 2026-06-24T07:42:35.794Z