English
Related papers

Related papers: SciWING -- A Software Toolkit for Scientific Docum…

200 papers

Scientific advancement relies on the ability to share and reproduce results. When data analysis or calculations are carried out using software written by scientists there are special challenges around code versions, quality and code…

Software Engineering · Computer Science 2025-07-09 S. Lee , C. Myers , A. Yang , T. Zhang , S. J. L. Billinge

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a…

We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the…

Scientific writing involves retrieving, summarizing, and citing relevant papers, which can be time-consuming processes in large and rapidly evolving fields. By making these processes inter-operable, natural language processing (NLP)…

Computation and Language · Computer Science 2023-11-07 Nianlong Gu , Richard H. R. Hahnloser

swdatatoolkit is a Python-based scientific software library designed to support the acquisition, preprocessing, and analysis of solar and space weather data. The toolkit consolidates functionality across multiple domains, including data…

Instrumentation and Methods for Astrophysics · Physics 2026-04-27 Dustin Kempton , Griffin Goodwin , Tarun Kumar Reddy Thippareddy , Reet Gupta , Viacheslav Sadykov , Rafal Angryk

We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by…

scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal "Modified BSD" open source license, provides a well-documented…

The proliferation of open-source scientific software for science and research presents opportunities and challenges. In this paper, we introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS)…

Software Engineering · Computer Science 2023-12-12 Addi Malviya-Thakur , Reed Milewicz , Lavinia Paganini , Ahmed Samir Imam Mahmoud , Audris Mockus

The academic literature of social sciences records human civilization and studies human social problems. With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for…

Computation and Language · Computer Science 2022-11-28 Si Shen , Jiangfeng Liu , Litao Lin , Ying Huang , Lin Zhang , Chang Liu , Yutong Feng , Dongbo Wang

The Statistical Toolkit is an open source system specialized in the statistical comparison of distributions. It addresses requirements common to different experimental domains, such as simulation validation (e.g. comparison of experimental…

Computational Physics · Physics 2015-06-11 M Batic , A. M. Paganoni , A. Pfeiffer , M. G. Pia , A. Ribon

The increasing importance of Computational Science and Engineering has highlighted the need for high-quality scientific software. However, research software development is often hindered by limited funding, time, staffing, and technical…

Software Engineering · Computer Science 2025-03-10 Armin Ariamajd , Raquel López-Ríos de Castro , Andrea Volkamer

Managing data and code in open scientific research is complicated by two key problems: large datasets often cannot be stored alongside code in repository platforms like GitHub, and iterative analysis can lead to unnoticed changes to data,…

Digital Libraries · Computer Science 2023-11-10 Vince Buffalo

The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated…

Modern science clearly demands for a higher level of reproducibility and collaboration. To make research fully reproducible one has to take care of several aspects: research protocol description, data access, environment preservation,…

Computers and Society · Computer Science 2017-12-06 Andrey Ustyuzhanin , Timothy Daniel Head , Igor Babuschkin , Alexander Tiunov

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale…

Computation and Language · Computer Science 2019-09-12 Iz Beltagy , Kyle Lo , Arman Cohan

Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters, with inference processes requiring substantial energy and computational resources. In contrast, the human…

Software Engineering · Computer Science 2024-12-23 Xin Du , Shifan Ye , Qian Zheng , Yangfan Hu , Rui Yan , Shunyu Qi , Shuyang Chen , Huajin Tang , Gang Pan , Shuiguang Deng

Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We…

Artificial Intelligence · Computer Science 2026-04-29 Ke Lin , Yilin Lu , Shreyas Bhat , Xuehang Guo , Junier Oliva , Qingyun Wang

Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of…

Machine Learning · Computer Science 2020-05-18 Jacob Montiel , Jesse Read , Albert Bifet , Talel Abdessalem

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper…

We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad…

Computation and Language · Computer Science 2020-05-14 Yada Pruksachatkun , Phil Yeres , Haokun Liu , Jason Phang , Phu Mon Htut , Alex Wang , Ian Tenney , Samuel R. Bowman
‹ Prev 1 2 3 10 Next ›