Related papers: Scikit-Multiflow: A Multi-output Streaming Framewo…

River: machine learning for streaming data in Python

River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning…

Machine Learning · Computer Science 2020-12-10 Jacob Montiel , Max Halford , Saulo Martiello Mastelini , Geoffrey Bolmier , Raphael Sourty , Robin Vaysse , Adil Zouitine , Heitor Murilo Gomes , Jesse Read , Talel Abdessalem , Albert Bifet

scikit-dyn2sel -- A Dynamic Selection Framework for Data Streams

Mining data streams is a challenge per se. It must be ready to deal with an enormous amount of data and with problems not present in batch machine learning, such as concept drift. Therefore, applying a batch-designed technique, such as…

Machine Learning · Computer Science 2020-08-21 Lucca Portes Cavalheiro , Jean Paul Barddal , Alceu de Souza Britto , Laurent Heutte

A scikit-based Python environment for performing multi-label classification

scikit-multilearn is a Python library for performing multi-label classification. The library is compatible with the scikit/scipy ecosystem and uses sparse matrices for all internal operations. It provides native Python implementations of…

Machine Learning · Computer Science 2018-12-11 Piotr Szymański , Tomasz Kajdanowicz

Scikit-learn: Machine Learning in Python

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a…

Machine Learning · Computer Science 2018-06-06 Fabian Pedregosa , Gaël Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion , Olivier Grisel , Mathieu Blondel , Andreas Müller , Joel Nothman , Gilles Louppe , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , Jake Vanderplas , Alexandre Passos , David Cournapeau , Matthieu Brucher , Matthieu Perrot , Édouard Duchesnay

stream-learn -- open-source Python library for difficult data stream batch analysis

stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate…

Machine Learning · Computer Science 2020-01-31 Paweł Ksieniewicz , Paweł Zyblewski

HyperStream: a Workflow Engine for Streaming Data

This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other…

Machine Learning · Computer Science 2019-08-09 Tom Diethe , Meelis Kull , Niall Twomey , Kacper Sokol , Hao Song , Miquel Perello-Nieto , Emma Tonkin , Peter Flach

Pilot-Streaming: A Stream Processing Framework for High-Performance Computing

An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and Internet-of-Thing (IoT) sensors. The development of streaming applications…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-13 Andre Luckow , George Chantzialexiou , Shantenu Jha

Standardized Evaluation of Machine Learning Methods for Evolving Data Streams

Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work…

Machine Learning · Computer Science 2022-04-29 Johannes Haug , Effi Tramountani , Gjergji Kasneci

FastFlow tutorial

FastFlow is a structured parallel programming framework targeting shared memory multicores. Its layered design and the optimized implementation of the communication mechanisms used to implement the FastFlow streaming networks provided to…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-25 Marco Aldinucci , Marco Danelutto , Massimo Torquati

FastFlow: Efficient Parallel Streaming Applications on Multi-core

Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-09-10 Marco Aldinucci , Massimo Torquati , Massimiliano Meneghin

Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-30 Xinzijian Liu , Yanbo Han , Zhuoyuan Li , Jiahao Fan , Chengqian Zhang , Jinzhe Zeng , Yifan Shan , Yannan Yuan , Wei-Hong Xu , Yun-Pei Liu , Yuzhi Zhang , Tongqi Wen , Darrin M. York , Zhicheng Zhong , Hang Zheng , Jun Cheng , Linfeng Zhang , Han Wang

SciDataFlow: A Tool for Improving the Flow of Data through Science

Managing data and code in open scientific research is complicated by two key problems: large datasets often cannot be stored alongside code in repository platforms like GitHub, and iterative analysis can lead to unnoticed changes to data,…

Digital Libraries · Computer Science 2023-11-10 Vince Buffalo

Scikit-network: Graph Analysis in Python

Scikit-network is a Python package inspired by scikit-learn for the analysis of large graphs. Graphs are represented by their adjacency matrix in the sparse CSR format of SciPy. The package provides state-of-the-art algorithms for ranking,…

Social and Information Networks · Computer Science 2020-09-17 Thomas Bonald , Nathan de Lara , Quentin Lutz , Bertrand Charpentier

StreamingHub: Interactive Stream Analysis Workflows

Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to…

Databases · Computer Science 2022-06-20 Yasith Jayawardana , Vikas G. Ashok , Sampath Jayarathna

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high…

Computation and Language · Computer Science 2025-12-22 Marco Gaido , Sara Papi , Mauro Cettolo , Matteo Negri , Luisa Bentivogli

SciCat: A Curated Dataset of Scientific Software Repositories

The proliferation of open-source scientific software for science and research presents opportunities and challenges. In this paper, we introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS)…

Software Engineering · Computer Science 2023-12-12 Addi Malviya-Thakur , Reed Milewicz , Lavinia Paganini , Ahmed Samir Imam Mahmoud , Audris Mockus

FluidDyn: a Python open-source framework for research and teaching in fluid dynamics

FluidDyn is a project to foster open-science and open-source in the fluid dynamics community. It is thought of as a research project to channel open-source dynamics, methods and tools to do science. We propose a set of Python packages…

Other Computer Science · Computer Science 2019-04-10 Pierre Augier , Ashwin Vishnu Mohanan , Cyrille Bonamy

SciWING -- A Software Toolkit for Scientific Document Processing

We introduce SciWING, an open-source software toolkit which provides access to pre-trained models for scientific document processing tasks, inclusive of citation string parsing and logical structure recovery. SciWING enables researchers to…

Digital Libraries · Computer Science 2020-10-26 Abhinav Ramesh Kashyap , Min-Yen Kan

API design for machine learning software: experiences from the scikit-learn project

Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design…

Machine Learning · Computer Science 2013-09-03 Lars Buitinck , Gilles Louppe , Mathieu Blondel , Fabian Pedregosa , Andreas Mueller , Olivier Grisel , Vlad Niculae , Peter Prettenhofer , Alexandre Gramfort , Jaques Grobler , Robert Layton , Jake Vanderplas , Arnaud Joly , Brian Holt , Gaël Varoquaux

Causify DataFlow: A Framework For High-performance Machine Learning Stream Computing

We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflows assume finite datasets and require substantial…

Machine Learning · Computer Science 2026-01-01 Giacinto Paolo Saggese , Paul Smith