Related papers: scikit-dyn2sel -- A Dynamic Selection Framework fo…

Scikit-Multiflow: A Multi-output Streaming Framework

Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of…

Machine Learning · Computer Science 2020-05-18 Jacob Montiel , Jesse Read , Albert Bifet , Talel Abdessalem

stream-learn -- open-source Python library for difficult data stream batch analysis

stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate…

Machine Learning · Computer Science 2020-01-31 Paweł Ksieniewicz , Paweł Zyblewski

An analytical framework for data stream mining techniques based on challenges and requirements

A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such…

Databases · Computer Science 2011-05-11 Mahnoosh Kholghi , Mohammadreza Keyvanpour

DESlib: A Dynamic ensemble selection library in Python

DESlib is an open-source python library providing the implementation of several dynamic selection techniques. The library is divided into three modules: (i) \emph{dcs}, containing the implementation of dynamic classifier selection methods…

Machine Learning · Computer Science 2020-03-06 Rafael M. O. Cruz , Luiz G. Hafemann , Robert Sabourin , George D. C. Cavalcanti

Imbalanced Data Stream Classification using Dynamic Ensemble Selection

Modern streaming data categorization faces significant challenges from concept drift and class imbalanced data. This negatively impacts the output of the classifier, leading to improper classification. Furthermore, other factors such as the…

Machine Learning · Computer Science 2023-09-29 Priya. S , Haribharathi Sivakumar , Vijay Arvind. R

Curriculum Dataset Distillation

Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still…

Computer Vision and Pattern Recognition · Computer Science 2025-07-14 Zhiheng Ma , Anjia Cao , Funing Yang , Yihong Gong , Xing Wei

River: machine learning for streaming data in Python

River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning…

Machine Learning · Computer Science 2020-12-10 Jacob Montiel , Max Halford , Saulo Martiello Mastelini , Geoffrey Bolmier , Raphael Sourty , Robin Vaysse , Adil Zouitine , Heitor Murilo Gomes , Jesse Read , Talel Abdessalem , Albert Bifet

Towards Free Data Selection with General-Purpose Models

A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. However, current approaches, represented by active learning methods, typically follow a…

Computer Vision and Pattern Recognition · Computer Science 2023-10-17 Yichen Xie , Mingyu Ding , Masayoshi Tomizuka , Wei Zhan

The Evolution of Dataset Distillation: Toward Scalable and Generalizable Solutions

Dataset distillation, which condenses large-scale datasets into compact synthetic representations, has emerged as a critical solution for training modern deep learning models efficiently. While prior surveys focus on developments before…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Ping Liu , Jiawei Du

Data Stream Clustering: A Review

Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for…

Machine Learning · Computer Science 2020-07-22 Alaettin Zubaroğlu , Volkan Atalay

DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2022-08-08 Giorgos Kordopatis-Zilos , Christos Tzelepis , Symeon Papadopoulos , Ioannis Kompatsiaris , Ioannis Patras

A Kernel Two-sample Test for Dynamical Systems

Evaluating whether data streams are drawn from the same distribution is at the heart of various machine learning problems. This is particularly relevant for data generated by dynamical systems since such systems are essential for many…

Machine Learning · Statistics 2022-09-07 Friedrich Solowjow , Dominik Baumann , Christian Fiedler , Andreas Jocham , Thomas Seel , Sebastian Trimpe

A Clustering-based Framework for Classifying Data Streams

The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches…

Machine Learning · Computer Science 2021-06-23 Xuyang Yan , Abdollah Homaifar , Mrinmoy Sarkar , Abenezer Girma , Edward Tunstel

DPASF: A Flink Library for Streaming Data preprocessing

Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data…

Databases · Computer Science 2018-10-16 Alejandro Alcalde-Barros , Diego García-Gil , Salvador García , Francisco Herrera

Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends

We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the…

Artificial Intelligence · Computer Science 2008-11-04 Alain Lelu , Martine Cadot , Pascal Cuxac

DIET: Learning to Distill Dataset Continually for Recommender Systems

Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture…

Information Retrieval · Computer Science 2026-03-27 Jiaqing Zhang , Hao Wang , Mingjia Yin , Bo Chen , Qinglin Jia , Rui Zhou , Ruiming Tang , ChaoYi Ma , Enhong Chen

SimiSketch: Efficiently Estimating Similarity of streaming Multisets

The challenge of estimating similarity between sets has been a significant concern in data science, finding diverse applications across various domains. However, previous approaches, such as MinHash, have predominantly centered around…

Data Structures and Algorithms · Computer Science 2024-05-31 Fenghao Dong , Yang He , Yutong Liang , Zirui Liu , Yuhan Wu , Peiqing Chen , Tong Yang

A Comprehensive Survey of Dataset Distillation

Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing…

Machine Learning · Computer Science 2023-12-27 Shiye Lei , Dacheng Tao

Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data

Practical tools for clustering streaming data must be fast enough to handle the arrival rate of the observations. Typically, they also must adapt on the fly to possible lack of stationarity; i.e., the data statistics may be time-dependent…

Machine Learning · Computer Science 2022-03-01 Or Dinari , Oren Freifeld

Evolving Text Data Stream Mining

A text stream is an ordered sequence of text documents generated over time. A massive amount of such text data is generated by online social platforms every day. Designing an algorithm for such text streams to extract useful information is…

Information Retrieval · Computer Science 2024-09-04 Jay Kumar