Digital Libraries
To facilitate the review, evaluation and analysis of scientific literature, the lit-tag R Shiny application provides a convenient interface for users to generate a citation database with custom, user-defined tags and notes. Lit-tag is not…
Accurate parsing of citations is necessary for machine-readable scholarly infrastructure. But, despite sustained interest in this problem, existing evaluation techniques are often not generalizable, based on synthetic data, or not publicly…
This study investigates the simultaneous use of multiple metadata schemas at research data repositories. The analysis covers how eight disciplinary research data repositories from the geosciences and social sciences use disciplinary…
Citations are essential for recognizing scientific contributions, yet citation behavior is shaped by more than just relevance or quality. We analyzed approximately 255,000 refereed astronomy articles published between 2000 and 2025 to…
Although bibliometrics has become an essential tool in the evaluation of research performance, bibliometric analyses are sensitive to a range of methodological choices. Subtle choices in data selection, indicator construction, and modeling…
Establishing an independent academic identity is a central yet insufficiently understood challenge for early-career researchers. However, limited resources and mentor-driven research agendas often constrain early efforts toward autonomy. To…
Standard citation metrics treat all citations as equal, obscuring the social and structural pathways through which scholarly influence propagates. I introduce Citation-Constellation, a freely available no-code tool for citation network…
Funding acknowledgments in scholarly publications provide large-scale trace data on organizations that support scientific research. We present a dataset for linking global science funding organizations to research publications by…
Equal-contribution authorship, in which two or more authors are designated as having contributed equally, is increasingly common in scientific publishing. Using approximately 480,000 tagged records from PubMed and PMC (2010-2024), we…
With the move towards open research information, the DOI registration agency DataCite is increasingly used as a source for metadata describing research data, for example to perform scientometric analyses. However, there is a lack of…
Currently, there is limited research investigating the phenomenon of research data repositories being shut down, and the impact this has on the long-term availability of data. This paper takes an infrastructure perspective on the…
Gender imbalance persists across science, technology, engineering, and mathematics (STEM) fields, including computer science, where it appears in researcher demographics, productivity, recognition, hiring, and career progression. Given…
Large language models such as ChatGPT have increased scholarly output, but whether this productivity boost produces genuine intellectual advancement remains untested. I address this gap by measuring the semantic novelty of 13,847 articles…
Public research funding agencies increasingly seek to steer health research toward higher levels of translation and societal relevance. Yet it remains unclear to what extent such policy shifts are effectively implemented and reflected in…
Women and men pursue different but complementary forms of scientific innovation. Analyzing 261,452 solo-authored papers by U.S. scholars, with patterns confirmed by millions of multi-authored articles, we show that women more often bridge…
The efficient management and planning of urban energy systems require integrated three-dimensional (3D) models that accurately represent both consumption nodes and distribution networks. This paper introduces our developed approach and…
We present a review and analysis of scientific paper embellishments -- simple visual elements that are deeply integrated into the text of scientific publications. These embellishments are increasingly used in research papers, which have the…
Digital-humanities work on semantic shift often alternates between handcrafted close readings and opaque embedding machinery. We present a reproducible expert-system style pipeline that quantifies lexical drift and its instability in the…
Traditional science maps visualize topics by clustering documents within a network, but they are inherently biased toward clustering certain topics over others. If these topics could be chosen, then the science maps could be tailored for…
Unraveling the hierarchical structure-property relationships is the central challenge of materials science, necessitating the interpretation of data across vast physical scales from micro to macro. Despite the rapid integration of Large…