English

Data Lakes for Digital Humanities

Databases 2020-12-07 v1

Abstract

Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.

Keywords

Cite

@article{arxiv.2012.02454,
  title  = {Data Lakes for Digital Humanities},
  author = {Jérôme Darmont and Cécile Favre and Sabine Loudcher and Camille Noûs},
  journal= {arXiv preprint arXiv:2012.02454},
  year   = {2020}
}

Comments

Data and Digital Humanities Track

R2 v1 2026-06-23T20:43:39.276Z