数字图书馆
The present study focuses on persistence in research productivity over the course of an individual's entire scientific career. We track 'late-career' scientists - scientists with at least 25 years of publishing experience (N=320,564) - in…
Bibliographic data is a rich source of information that goes beyond the use cases of location and citation -- it also encodes both cultural and technological context. For most of its existence, the scholarly record has changed slowly and…
In this study, the global scientific workforce is explored through large-scale, generational, cross-sectional, and longitudinal approaches. We examine 4.3 million nonoccasional scientists from 38 OECD countries publishing in 1990-2021. Our…
As Open Access continues to gain importance in science policy, understanding the proportion of Open Access publications relative to the total research output of research-performing organizations, individual countries, or even globally has…
Post-publication peer review (PPPR) has emerged as an important supplement to traditional peer review, with social media playing a growing role in publicising potential problems in published research. However, it remains unclear whether…
This paper presents a modular AI agentic skill pipeline for automating subject indexing with Library of Congress Subject Headings (LCSH). Subject indexing - the process of analyzing a work's aboutness, selecting controlled vocabulary terms,…
The unprecedented proliferation of digital data presents significant challenges in access, integration, and value creation across all data-intensive sectors. Valuable information is frequently encapsulated within disparate systems,…
Contemporary scientometric indicators remain anchored in paradigms and axioms from when academic research was conducted in small scholarly communities. With the global proliferation of scientific research, academia is now organized in large…
HERITRACE is an open-source web application that enables users without Semantic Web expertise to curate RDF data through form-based interfaces with automatic provenance documentation and change tracking in RDF. It uses SHACL for data model…
OpenAlex has recently emerged as a leading alternative to proprietary bibliometric sources. However, concerns remain regarding the quality of its metadata, especially the institutional profiles which are crucial for evaluating…
Numerous metascience studies and other initiatives have begun to monitor the prevalence of open science practices when it is more important to understand the 'downstream' effects or impacts of open science. PLOS and DataSeer have developed…
Cross-national comparison of research funding projects is increasingly important for science policy and strategic planning, but language differences remain a major obstacle. In particular, KAKENHI project descriptions are written primarily…
Objectives. Major research and implementation efforts have been devoted to indexing articles according to the major topics discussed, but much less effort to indexing their publication types and study designs (collectively, PTs). In this…
Intergovernmental organizations (IGOs) increasingly rely on scientific evidence, yet the pathways through which scientific research enters policy remain opaque. By linking 230,737 scientific papers cited in IGO policy documents (2015-2023)…
Do e-scooter speed governance policies yield behavioral safety gains beyond the mechanical cap they impose? A firmware ceiling mechanically prevents speeding, but whether the same riders also generate fewer harsh accelerations and harsh…
This study develops and evaluates a systematic methodology for constructing news datasets from Google News, combining automated web scraping, large language model (LLM)-based metadata extraction, and SCImago Media Rankings enrichment. Using…
ACM and IEEE are the two premier associations on computing and electrical/electronics engineering which publish and organize the great majority of periodicals and conferences, respectively, serving these disciplines. Science is a constantly…
Our paper introduces a generative, multiagent AI framework designed to overcome the rigidity, limited flexibility and technical barriers of current bibliometric tools. The objective is to enable researchers to perform fully dynamic,…
Large language models (LLMs) have demonstrated remarkable versatility across a wide range of natural language processing tasks and domains. One such task is Named Entity Recognition (NER), which involves identifying and classifying proper…
This study analyses the publication activity and migration patterns of Ukrainian scholars in the social sciences and humanities (SSH) during the initial two years of the Russo-Ukrainian war. Focusing on scholars who published at least three…