English

stream-learn -- open-source Python library for difficult data stream batch analysis

Machine Learning 2020-01-31 v1 Computer Vision and Pattern Recognition Machine Learning

Abstract

stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.

Keywords

Cite

@article{arxiv.2001.11077,
  title  = {stream-learn -- open-source Python library for difficult data stream batch analysis},
  author = {Paweł Ksieniewicz and Paweł Zyblewski},
  journal= {arXiv preprint arXiv:2001.11077},
  year   = {2020}
}