Digital Libraries
The large-scale digitization of historical archives has created a paradox: "dark data"-digital objects lacking metadata for retrieval. Manual archival description is slow and expensive, limiting discovery and reuse. We propose Vidya, a…
Researchers often begin new projects by conducting a broad State-of-the-Art review before they are ready to define the narrow protocol required by a systematic review. This is especially common in multidisciplinary areas where terminology…
LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role:…
Scientific peer review increasingly struggles to assess reproducibility at the scale and complexity of modern research output. Evaluating reproducibility requires reconstructing experimental dependencies, methodological choices, data flows,…
Modern researchers engage in diverse activities, assume multiple contribution roles, and produce a variety of outputs beyond traditional publications. This broader view of research contributions is increasingly recognised by responsible…
Web accessibility aims to ensure that web content and services are usable by people with diverse abilities. In recent years, Large Language Models (LLMs) have been increasingly explored to support accessibility-related tasks on the web,…
This study examines the distribution of Artificial Intelligence (AI) research across European NUTS-3 regions during the period 2015-2024. Using bibliometric data from Clarivate InCites and the Citation Topics classification system, we…
We present SemRepo, an RDF knowledge graph comprising over 81 million triples describing nearly 200,000 GitHub repositories associated with scientific research. SemRepo captures repository-level metadata, such as contributors, issues, and…
Citation graphs are fundamental tools for modeling scientific structure, but are often fragmented due to missing citations of scientifically connected articles. To address this issue, we propose a computationally efficient hybrid framework…
Paper mills produce fraudulent research manuscripts built on recycled tables and figures, or on entirely fabricated data. A more recent pattern has emerged: apparently genuine trials with real patients, but with manipulated statistical…
This exploratory study examines how low-impact journals, defined through subject-normalized Eigenfactor percentiles, are associated with denser and more reciprocating patterns of author-to-author citations. Using Crossref records, we assign…
Scholarly blogs have become an important venue for scholarly communication, yet they remain insufficiently integrated into digital research and information infrastructures, which places their long-term preservation and citability at risk.…
The extent to which Artificial Intelligence (AI) technologies can trigger generalized paradigm shifts in science is unclear. Although these technologies have revolutionized data collection and analysis in specific fields, their overall…
Large language models (LLMs) are increasingly used in scientific research and discovery, supporting tasks ranging from literature retrieval and synthesis to hypothesis generation, autonomous experimentation, and research evaluation.…
Recent artificial intelligence has developed rapidly with significant interdisciplinary expansion, yet existing studies often treat it as a whole, lacking systematic long-term subfield comparisons and structural analyses, thereby limiting…
Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a…
Faculty mobility is often understood as a mechanism through which universities redistribute scientific talent and potentially improve research performance. Yet the system-level structure of mobility and its association with individual…
The present study focuses on persistence in research productivity over the course of an individual's entire scientific career. We track 'late-career' scientists - scientists with at least 25 years of publishing experience (N=320,564) - in…
Bibliographic data is a rich source of information that goes beyond the use cases of location and citation -- it also encodes both cultural and technological context. For most of its existence, the scholarly record has changed slowly and…
In this study, the global scientific workforce is explored through large-scale, generational, cross-sectional, and longitudinal approaches. We examine 4.3 million nonoccasional scientists from 38 OECD countries publishing in 1990-2021. Our…