Related papers: SciWING -- A Software Toolkit for Scientific Docum…

scikit-package -- software packaging standards and roadmap for sharing reproducible scientific software

Scientific advancement relies on the ability to share and reproduce results. When data analysis or calculations are carried out using software written by scientists there are special challenges around code versions, quality and code…

Software Engineering · Computer Science 2025-07-09 S. Lee , C. Myers , A. Yang , T. Zhang , S. J. L. Billinge

Scikit-learn: Machine Learning in Python

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a…

Machine Learning · Computer Science 2018-06-06 Fabian Pedregosa , Gaël Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion , Olivier Grisel , Mathieu Blondel , Andreas Müller , Joel Nothman , Gilles Louppe , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , Jake Vanderplas , Alexandre Passos , David Cournapeau , Matthieu Brucher , Matthieu Perrot , Édouard Duchesnay

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the…

Artificial Intelligence · Computer Science 2026-01-07 Yiheng Wang , Yixin Chen , Shuo Li , Yifan Zhou , Bo Liu , Hengjian Gao , Jiakang Yuan , Jia Bu , Wanghan Xu , Yuhao Zhou , Xiangyu Zhao , Zhiwang Zhou , Fengxiang Wang , Haodong Duan , Songyang Zhang , Jun Yao , Han Deng , Yizhou Wang , Jiabei Xiao , Jiaqi Liu , Encheng Su , Yujie Liu , Weida Wang , Junchi Yao , Shenghe Zheng , Haoran Sun , Runmin Ma , Xiangchao Yan , Bo Zhang , Dongzhan Zhou , Shufei Zhang , Peng Ye , Xiaosong Wang , Shixiang Tang , Wenlong Zhang , Lei Bai

SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation

Scientific writing involves retrieving, summarizing, and citing relevant papers, which can be time-consuming processes in large and rapidly evolving fields. By making these processes inter-operable, natural language processing (NLP)…

Computation and Language · Computer Science 2023-11-07 Nianlong Gu , Richard H. R. Hahnloser

Describing the swdatatoolkit: A Space Weather Data Analysis Library

swdatatoolkit is a Python-based scientific software library designed to support the acquisition, preprocessing, and analysis of solar and space weather data. The toolkit consolidates functionality across multiple domains, including data…

Instrumentation and Methods for Astrophysics · Physics 2026-04-27 Dustin Kempton , Griffin Goodwin , Tarun Kumar Reddy Thippareddy , Reet Gupta , Viacheslav Sadykov , Rafal Angryk

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by…

Computation and Language · Computer Science 2025-01-31 Nikolaos Livathinos , Christoph Auer , Maksym Lysak , Ahmed Nassar , Michele Dolfi , Panos Vagenas , Cesar Berrospi Ramis , Matteo Omenetti , Kasper Dinkla , Yusik Kim , Shubham Gupta , Rafael Teixeira de Lima , Valery Weber , Lucas Morin , Ingmar Meijer , Viktor Kuropiatnyk , Peter W. J. Staar

scikit-image: Image processing in Python

scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal "Modified BSD" open source license, provides a well-documented…

Mathematical Software · Computer Science 2014-07-24 Stefan van der Walt , Johannes L. Schönberger , Juan Nunez-Iglesias , François Boulogne , Joshua D. Warner , Neil Yager , Emmanuelle Gouillart , Tony Yu , the scikit-image contributors

SciCat: A Curated Dataset of Scientific Software Repositories

The proliferation of open-source scientific software for science and research presents opportunities and challenges. In this paper, we introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS)…

Software Engineering · Computer Science 2023-12-12 Addi Malviya-Thakur , Reed Milewicz , Lavinia Paganini , Ahmed Samir Imam Mahmoud , Audris Mockus

SsciBERT: A Pre-trained Language Model for Social Science Texts

The academic literature of social sciences records human civilization and studies human social problems. With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for…

Computation and Language · Computer Science 2022-11-28 Si Shen , Jiangfeng Liu , Litao Lin , Ying Huang , Lin Zhang , Chang Liu , Yutong Feng , Dongbo Wang

A new development cycle of the Statistical Toolkit

The Statistical Toolkit is an open source system specialized in the statistical comparison of distributions. It addresses requirements common to different experimental domains, such as simulation validation (e.g. comparison of experimental…

Computational Physics · Physics 2015-06-11 M Batic , A. M. Paganoni , A. Pfeiffer , M. G. Pia , A. Ribon

PyPackIT: Automated Research Software Engineering for Scientific Python Applications on GitHub

The increasing importance of Computational Science and Engineering has highlighted the need for high-quality scientific software. However, research software development is often hindered by limited funding, time, staffing, and technical…

Software Engineering · Computer Science 2025-03-10 Armin Ariamajd , Raquel López-Ríos de Castro , Andrea Volkamer

SciDataFlow: A Tool for Improving the Flow of Data through Science

Managing data and code in open scientific research is complicated by two key problems: large datasets often cannot be stored alongside code in repository platforms like GitHub, and iterative analysis can lead to unnoticed changes to data,…

Digital Libraries · Computer Science 2023-11-10 Vince Buffalo

scikit-fda: A Python Package for Functional Data Analysis

The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated…

Computation · Statistics 2024-09-04 Carlos Ramos-Carreño , José Luis Torrecilla , Miguel Carbajo-Berrocal , Pablo Marcos , Alberto Suárez

Everware toolkit. Supporting reproducible science and challenge-driven education

Modern science clearly demands for a higher level of reproducibility and collaboration. To make research fully reproducible one has to take care of several aspects: research protocol description, data access, environment preservation,…

Computers and Society · Computer Science 2017-12-06 Andrey Ustyuzhanin , Timothy Daniel Head , Igor Babuschkin , Alexander Tiunov

SciBERT: A Pretrained Language Model for Scientific Text

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale…

Computation and Language · Computer Science 2019-09-12 Iz Beltagy , Kyle Lo , Arman Cohan

Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model

Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters, with inference processes requiring substantial energy and computational resources. In contrast, the human…

Software Engineering · Computer Science 2024-12-23 Xin Du , Shifan Ye , Qian Zheng , Yangfan Hu , Rui Yan , Shunyu Qi , Shuyang Chen , Huajin Tang , Gang Pan , Shuiguang Deng

SciDER: Scientific Data-centric End-to-end Researcher

Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We…

Artificial Intelligence · Computer Science 2026-04-29 Ke Lin , Yilin Lu , Shreyas Bhat , Xuehang Guo , Junier Oliva , Qingyun Wang

Scikit-Multiflow: A Multi-output Streaming Framework

Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of…

Machine Learning · Computer Science 2020-05-18 Jacob Montiel , Jesse Read , Albert Bifet , Talel Abdessalem

SpeechBrain: A General-Purpose Speech Toolkit

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-10 Mirco Ravanelli , Titouan Parcollet , Peter Plantinga , Aku Rouhe , Samuele Cornell , Loren Lugosch , Cem Subakan , Nauman Dawalatabad , Abdelwahab Heba , Jianyuan Zhong , Ju-Chieh Chou , Sung-Lin Yeh , Szu-Wei Fu , Chien-Feng Liao , Elena Rastorgueva , François Grondin , William Aris , Hwidong Na , Yan Gao , Renato De Mori , Yoshua Bengio

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad…

Computation and Language · Computer Science 2020-05-14 Yada Pruksachatkun , Phil Yeres , Haokun Liu , Jason Phang , Phu Mon Htut , Alex Wang , Ian Tenney , Samuel R. Bowman