Related papers: mPyPl: Python Monadic Pipeline Library for Complex…
The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data…
PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…
Open-source process mining provides many algorithms for the analysis of event data which could be used to analyze mainstream processes (e.g., O2C, P2P, CRM). However, compared to commercial tools, they lack the performance and struggle to…
Misconceptions about program execution hinder many novice programmers. We introduce SimpliPy, a notional machine designed around a carefully chosen Python subset to clarify core control flow and scoping concepts. Its foundation is a precise…
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python…
In this report, we present a new programming model based on Pipelines and Operators, which are the building blocks of programs written in PiCo, a DSL for Data Analytics Pipelines. In the model we propose, we use the term Pipeline to denote…
A large amount of data is produced every second from modern information systems such as mobile devices, the world wide web, Internet of Things, social media, etc. Analysis and mining of this massive data requires a lot of advanced tools and…
Multiple Kernel Learning is a recent and powerful paradigm to learn the kernel function from data. In this paper, we introduce MKLpy, a python-based framework for Multiple Kernel Learning. The library provides Multiple Kernel Learning…
We present DeepAL, a Python library that implements several common strategies for active learning, with a particular emphasis on deep active learning. DeepAL provides a simple and unified framework based on PyTorch that allows users to…
The recomputability and reproducibility of results from scientific software requires access to both the source code and all associated input and output data. However, the full collection of these resources often does not accompany the key…
Machine learning solutions are very popular in the field of chemoinformatics, where they have numerous applications, such as novel drug discovery or molecular property prediction. Molecular fingerprints are algorithms commonly used for…
Python is a particularly appealing language to carry out data analysis, owing in part to its user-friendly character as well as its access to well maintained and powerful libraries like NumPy and SciPy. Still, for the purpose of analyzing…
Linear operators and optimisation are at the core of many algorithms used in signal and image processing, remote sensing, and inverse problems. For small to medium-scale problems, existing software packages (e.g., MATLAB, Python numpy and…
Process-mining techniques have emerged as powerful tools for analyzing event data to gain insights into business processes. In this paper, we present a comprehensive analysis of road traffic fine management processes using the pm4py library…
We present an open source Python 3 library aimed at practitioners of molecular simulation, especially Monte Carlo simulation. The aims of the library are to facilitate the generation of simulation data for a wide range of problems; and to…
With the recent rapid progress in the study of deep generative models (DGMs), there is a need for a framework that can implement them in a simple and generic way. In this research, we focus on two features of DGMs: (1) deep neural networks…
PySEMTools is a Python-based library for post-processing simulation data produced with high-order hexahedral elements in the context of the spectral element method in computational fluid dynamics. It aims to minimize intermediate steps…
River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning…
In recent years, there has been increasing interest in network diffusion models and related problems. The most popular of these are the independent cascade and linear threshold models. Much of the recent experimental work done on these…
Nowadays machine learning (ML) practitioners have access to numerous ML libraries available online. Such libraries can be used to create ML pipelines that consist of a series of steps where each step may invoke up to several ML libraries…