English

Badgers: generating data quality deficits with Python

Machine Learning 2023-07-11 v1

Abstract

Generating context specific data quality deficits is necessary to experimentally assess data quality of data-driven (artificial intelligence (AI) or machine learning (ML)) applications. In this paper we present badgers, an extensible open-source Python library to generate data quality deficits (outliers, imbalanced data, drift, etc.) for different modalities (tabular data, time-series, text, etc.). The documentation is accessible at https://fraunhofer-iese.github.io/badgers/ and the source code at https://github.com/Fraunhofer-IESE/badgers

Keywords

Cite

@article{arxiv.2307.04468,
  title  = {Badgers: generating data quality deficits with Python},
  author = {Julien Siebert and Daniel Seifert and Patricia Kelbert and Michael Kläs and Adam Trendowicz},
  journal= {arXiv preprint arXiv:2307.04468},
  year   = {2023}
}

Comments

17 pages, 16 figures

R2 v1 2026-06-28T11:25:50.297Z