Digital Libraries
Open Science has become a central framework for promoting transparency, accessibility, and inclusiveness in scholarly research. While the Digital Humanities (DH) community has long embraced openness in terms of research outputs, less…
Science advances not only through the accumulation of facts but also through the evolution of tools. Crucially, tools are rarely used in isolation. They form tool portfolios, combinations shaped by a discipline's workflows and analytical…
We present a semantic-structural atlas of transportation research built from 120{,}323 papers across 34 peer-reviewed journals published between 1967 and 2025, roughly an order of magnitude larger than and a decade beyond Sun and…
In the academic landscape, scientific research has been primarily conducted through research institutions, which requires a massive influx of funds from various sources. Presently, these funding bodies have been moving from trust-based…
This study presents a large-scale network dataset, NIH-MPINet, curated from NIH RePORTER and PubMed, characterizing collaboration among multiple Principal Investigators (multi-PIs) on NIH R01-equivalent grants from 2006 to 2023. The network…
Federal research funding shapes the direction, diversity, and impact of the US scientific enterprise. Large language models (LLMs) are rapidly diffusing into scientific practice, holding substantial promise while raising widespread…
Paper mills are a growing threat to the integrity of science, yet their penetration in conference proceedings remains underexplored despite conferences being more important than journals in some scientific subfields. This study aims to…
The debate about scholarly knowledge infrastructure has long been framed as a contest between openness and commercial enclosure. This framing distorts both policy and practice. The real tension lies between the persistent cost of producing…
This study investigates the correlation of citation impact with various open science indicators (OSI) within the French Open Science Monitor (FOSM), a dataset comprising approximately 900,000 publications authored by French authors from…
This study describes the methodology and analyses the results of the process of mapping entities between two large open bibliographic metadata collections, OpenCitations Meta and OpenAlex. The primary objective of this mapping is to…
Scientific posters are one of the most common forms of scholarly communication and contain early-stage insights with potential to accelerate scientific discovery. We investigated where posters are shared, to what extent their sharing aligns…
This paper presents a comprehensive dataset of doctoral theses defended in France between 1985 and 2025, constructed from multiple national academic metadata sources. The dataset is primarily based on data from the French national thesis…
Assessing a cited paper's impact is typically done by analyzing its citation context in isolation within the citing paper. While this focuses on the most directly relevant text, it prevents relative comparisons across all the works a paper…
The probability folder of Mathlib, Lean's mathematical library, makes a heavy use of Markov kernels. We present their definition and properties and describe the formalization of the disintegration theorem for Markov kernels. That theorem is…
OpenCitations Meta is a new database for open bibliographic metadata of scholarly publications involved in the citations indexed by the OpenCitations infrastructure, adhering to Open Science principles and published under a CC0 license to…
Research methods constitute an indispensable tool for scholars engaged in scientific inquiry. Investigating how scholars use research methods throughout their careers can reveal distinct patterns in method adoption, providing valuable…
Scientific tools dictate the boundaries of human knowledge, serving as the foundation for perceptions and explorations. In the era of Big Science, science are increasingly dependent on advanced analytical technologies and experimental…
Open-source scientific software is a major driver of scientific progress, yet its development and reuse remain difficult in collaborative settings. Researchers repeatedly face four recurring challenges: discovering and reproducing existing…
In modern scientific collaboration networks, certain researchers play a pivotal role in bridging scholars who have never worked together - a phenomenon we term academic "match-makers." Despite their potential importance, the prevalence,…
Large Language Models (LLMs) can be helpful for literature search and summarisation, but retracted articles can confuse them. This article asks three open weights (offline) LLMs whether 161 high profile retracted articles had been…