Related papers: stream-learn -- open-source Python library for dif…

River: machine learning for streaming data in Python

River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning…

Machine Learning · Computer Science 2020-12-10 Jacob Montiel , Max Halford , Saulo Martiello Mastelini , Geoffrey Bolmier , Raphael Sourty , Robin Vaysse , Adil Zouitine , Heitor Murilo Gomes , Jesse Read , Talel Abdessalem , Albert Bifet

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented…

Machine Learning · Computer Science 2016-09-22 Guillaume Lemaitre , Fernando Nogueira , Christos K. Aridas

scikit-dyn2sel -- A Dynamic Selection Framework for Data Streams

Mining data streams is a challenge per se. It must be ready to deal with an enormous amount of data and with problems not present in batch machine learning, such as concept drift. Therefore, applying a batch-designed technique, such as…

Machine Learning · Computer Science 2020-08-21 Lucca Portes Cavalheiro , Jean Paul Barddal , Alceu de Souza Britto , Laurent Heutte

Seglearn: A Python Package for Learning Sequences and Time Series

Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting…

Machine Learning · Statistics 2019-01-28 David M. Burns , Cari M. Whyne

Scikit-Multiflow: A Multi-output Streaming Framework

Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of…

Machine Learning · Computer Science 2020-05-18 Jacob Montiel , Jesse Read , Albert Bifet , Talel Abdessalem

Scikit-learn: Machine Learning in Python

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a…

Machine Learning · Computer Science 2018-06-06 Fabian Pedregosa , Gaël Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion , Olivier Grisel , Mathieu Blondel , Andreas Müller , Joel Nothman , Gilles Louppe , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , Jake Vanderplas , Alexandre Passos , David Cournapeau , Matthieu Brucher , Matthieu Perrot , Édouard Duchesnay

Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification

Data stream mining aims at extracting meaningful knowledge from continually evolving data streams, addressing the challenges posed by nonstationary environments, particularly, concept drift which refers to a change in the underlying data…

Machine Learning · Computer Science 2025-01-03 Kleanthis Malialis , Jin Li , Christos G. Panayiotou , Marios M. Polycarpou

srlearn: A Python Library for Gradient-Boosted Statistical Relational Models

We present srlearn, a Python library for boosted statistical relational models. We adapt the scikit-learn interface to this setting and provide examples for how this can be used to express learning and inference problems.

Machine Learning · Computer Science 2019-12-19 Alexander L. Hayes

SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams

Many real-world data stream applications not only suffer from concept drift but also class imbalance. Yet, very few existing studies investigated this joint challenge. Data difficulty factors, which have been shown to be key challenges in…

Machine Learning · Computer Science 2023-08-30 Chun Wai Chiu , Leandro L. Minku

metric-learn: Metric Learning Algorithms in Python

metric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to…

Machine Learning · Computer Science 2020-07-28 William de Vazelhes , CJ Carey , Yuan Tang , Nathalie Vauquier , Aurélien Bellet

Resilient Class-Incremental Learning: on the Interplay of Drifting, Unlabelled and Imbalanced Data Streams

In today's connected world, the generation of massive streaming data across diverse domains has become commonplace. In the presence of concept drift, class imbalance, label scarcity, and new class emergence, they jointly degrade…

Machine Learning · Computer Science 2026-02-11 Jin Li , Kleanthis Malialis , Marios Polycarpou

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there…

Machine Learning · Computer Science 2023-07-19 Gabriel Aguiar , Bartosz Krawczyk , Alberto Cano

mvlearn: Multiview Machine Learning in Python

As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years. However, no comprehensive package exists that enables…

Machine Learning · Statistics 2021-05-27 Ronan Perry , Gavin Mischler , Richard Guo , Theodore Lee , Alexander Chang , Arman Koul , Cameron Franz , Hugo Richard , Iain Carmichael , Pierre Ablin , Alexandre Gramfort , Joshua T. Vogelstein

Imbalanced Data Stream Classification using Dynamic Ensemble Selection

Modern streaming data categorization faces significant challenges from concept drift and class imbalanced data. This negatively impacts the output of the classifier, leading to improper classification. Furthermore, other factors such as the…

Machine Learning · Computer Science 2023-09-29 Priya. S , Haribharathi Sivakumar , Vijay Arvind. R

Pymc-learn: Practical Probabilistic Machine Learning in Python

$\textit{Pymc-learn}$ is a Python package providing a variety of state-of-the-art probabilistic models for supervised and unsupervised machine learning. It is inspired by $\textit{scikit-learn}$ and focuses on bringing probabilistic machine…

Machine Learning · Statistics 2018-11-05 Daniel Emaasit

Scikit-network: Graph Analysis in Python

Scikit-network is a Python package inspired by scikit-learn for the analysis of large graphs. Graphs are represented by their adjacency matrix in the sparse CSR format of SciPy. The package provides state-of-the-art algorithms for ranking,…

Social and Information Networks · Computer Science 2020-09-17 Thomas Bonald , Nathan de Lara , Quentin Lutz , Bertrand Charpentier

HyperStream: a Workflow Engine for Streaming Data

This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other…

Machine Learning · Computer Science 2019-08-09 Tom Diethe , Meelis Kull , Niall Twomey , Kacper Sokol , Hao Song , Miquel Perello-Nieto , Emma Tonkin , Peter Flach

Concept Drift Detection from Multi-Class Imbalanced Data Streams

Continual learning from data streams is among the most important topics in contemporary machine learning. One of the biggest challenges in this domain lies in creating algorithms that can continuously adapt to arriving data. However,…

Machine Learning · Computer Science 2021-04-22 Łukasz Korycki , Bartosz Krawczyk

A scikit-based Python environment for performing multi-label classification

scikit-multilearn is a Python library for performing multi-label classification. The library is compatible with the scikit/scipy ecosystem and uses sparse matrices for all internal operations. It provides native Python implementations of…

Machine Learning · Computer Science 2018-12-11 Piotr Szymański , Tomasz Kajdanowicz

Standardized Evaluation of Machine Learning Methods for Evolving Data Streams

Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work…

Machine Learning · Computer Science 2022-04-29 Johannes Haug , Effi Tramountani , Gjergji Kasneci