Related papers: DataSist: A Python-based library for easy data ana…

PyTond: Efficient Python Data Science on the Shoulders of Databases

Python data science libraries such as Pandas and NumPy have recently gained immense popularity. Although these libraries are feature-rich and easy to use, their scalability limitations require more robust computational resources. In this…

Databases · Computer Science 2024-07-17 Hesam Shahrokhi , Amirali Kaboli , Mahdi Ghorbani , Amir Shaikhha

gnss_lib_py: Analyzing GNSS Data with Python

This paper presents gnss_lib_py, a Python library used to parse, analyze, and visualize data from a variety of GNSS (Global Navigation Satellite Systems) data sources. The gnss_lib_py library's ease of use, modular capabilities, testing…

Robotics · Computer Science 2024-08-20 Derek Knowles , Ashwin Vivek Kanhere , Daniel Neamati , Grace Gao

GraSPy: Graph Statistics in Python

We introduce GraSPy, a Python library devoted to statistical inference, machine learning, and visualization of random graphs and graph populations. This package provides flexible and easy-to-use algorithms for analyzing and understanding…

Social and Information Networks · Computer Science 2019-10-25 Jaewon Chung , Benjamin D. Pedigo , Eric W. Bridgeford , Bijan K. Varjavand , Hayden S. Helm , Joshua T. Vogelstein

Niimpy: a toolbox for behavioral data analysis

Behavioral studies using personal digital devices typically produce rich longitudinal datasets of mixed data types. These data provide information about the behavior of users of these devices in real-time and in the users' natural…

Human-Computer Interaction · Computer Science 2022-12-06 A. Ikäheimonen , A. M. Triana , N. Luong , A. Ziaei , J. Rantaharju , R. Darst , T. Aledavood

PyMatterSim: a Python Data Analysis Library for Computer Simulations of Materials Science, Physics, Chemistry, and Beyond

Computer simulation has become one of the most important tools in scientific research in many disciplines. Benefiting from the dynamical trajectories regulated by versatile interatomic interactions, various material properties can be…

Materials Science · Physics 2024-11-28 Y. -C. Hu , J. Tian

GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns

Data exploration is an important step of every data science and machine learning project, including those involving textual data. We provide a novel language tool, in the form of a publicly available Python library for extracting patterns…

Computation and Language · Computer Science 2022-06-20 Piyawat Lertvittayakumjorn , Leshem Choshen , Eyal Shnarch , Francesca Toni

Landscape of High-performance Python to Develop Data Science and Machine Learning Applications

Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-24 Oscar Castro , Pierrick Bruneau , Jean-Sébastien Sottet , Dario Torregrossa

PyRDM: A Python-based library for automating the management and online publication of scientific software and data

The recomputability and reproducibility of results from scientific software requires access to both the source code and all associated input and output data. However, the full collection of these resources often does not accompany the key…

Computational Engineering, Finance, and Science · Computer Science 2015-12-24 Christian T. Jacobs , Alexandros Avdis , Gerard J. Gorman , Matthew D. Piggott

Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the…

Machine Learning · Computer Science 2020-04-01 Sebastian Raschka , Joshua Patterson , Corey Nolet

Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery

The growth of large, programatically accessible bibliometrics databases presents new opportunities for complex analyses of publication metadata. In addition to providing a wealth of information about authors and institutions, databases such…

Digital Libraries · Computer Science 2023-01-26 Michele Pasin , Richard Abdill

DataAssist: A Machine Learning Approach to Data Cleaning and Preparation

Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools…

Machine Learning · Computer Science 2023-07-18 Kartikay Goyle , Quin Xie , Vakul Goyle

Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science

Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000's, with…

Software Engineering · Computer Science 2019-05-16 Alessandro Berti , Sebastiaan J. van Zelst , Wil van der Aalst

Basic Data Analysis and More - A Guided Tour Using Python

In these lecture notes, a selection of frequently required statistical tools will be introduced and illustrated. They allow to post-process data that stem from, e.g., large-scale numerical simulations (aka sequence of random experiments).…

Data Analysis, Statistics and Probability · Physics 2012-07-26 O. Melchert

Minimalist Data Wrangling with Python

Minimalist Data Wrangling with Python is envisaged as a student's first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different…

Machine Learning · Computer Science 2022-11-10 Marek Gagolewski

PyPOTS: A Python Toolkit for Machine Learning on Partially-Observed Time Series

PyPOTS is an open-source Python library dedicated to data mining and analysis on multivariate partially-observed time series with missing values. Particularly, it provides easy access to diverse algorithms categorized into five tasks:…

Machine Learning · Computer Science 2025-07-10 Wenjie Du , Yiyuan Yang , Linglong Qian , Jun Wang , Qingsong Wen

BEANS - a software package for distributed Big Data analysis

BEANS software is a web based, easy to install and maintain, new tool to store and analyse data in a distributed way for a massive amount of data. It provides a clear interface for querying, filtering, aggregating, and plotting data from an…

Instrumentation and Methods for Astrophysics · Physics 2016-03-25 Arkadiusz Hypki

CyNetDiff -- A Python Library for Accelerated Implementation of Network Diffusion Models

In recent years, there has been increasing interest in network diffusion models and related problems. The most popular of these are the independent cascade and linear threshold models. Much of the recent experimental work done on these…

Social and Information Networks · Computer Science 2024-04-29 Eliot W. Robson , Dhemath Reddy , Abhishek K. Umrawal

dynsight: an Open Python Platform for Simulation and Experimental Trajectory Data Analysis

The study of complex many-body systems via analysis of the trajectories of the units that dynamically move and interact within them is a non-trivial task. The workflow for extracting meaningful information from the raw trajectory data is…

Materials Science · Physics 2025-10-31 Simone Martino , Matteo Becchi , Andrew Tarzia , Daniele Rapetti , Giovanni M. Pavan

Scikit-mobility: a Python library for the analysis, generation and risk assessment of mobility data

The last decade has witnessed the emergence of massive mobility data sets, such as tracks generated by GPS devices, call detail records, and geo-tagged posts from social media platforms. These data sets have fostered a vast scientific…

Physics and Society · Physics 2021-06-07 Luca Pappalardo , Filippo Simini , Gianni Barlacchi , Roberto Pellungrini

Python for Smarter Cities: Comparison of Python libraries for static and interactive visualisations of large vector data

Local governments, as part of 'smart city' initiatives and to promote interoperability, are increasingly incorporating open-source software into their data management, analysis, and visualisation workflows. Python, with its concise and…

Computers and Society · Computer Science 2022-03-01 Gregor Herda , Robert McNabb