数据库 — Scifaro

TCDRM: A Tenant Budget-Aware Data Replication Framework for Multi-Cloud Computing

Multi-cloud computing systems face significant challenges in ensuring acceptable performance while adhering to tenant budget requirements. This paper proposes a tenant budget-aware (tenant-centric) data replication framework for Multi-Cloud…

数据库 · 计算机科学 2025-10-10 Santatra Hagamalala Bernardin , Riad Mokadem , Franck Morvan , Hasinarivo Ramanana , Hasimandimby Rakotoarivelo

Independence Under Incomplete Information

We initiate an investigation how the fundamental concept of independence can be represented effectively in the presence of incomplete information in relational databases. The concepts of possible and certain independence are proposed, and…

数据库 · 计算机科学 2025-10-10 Miika Hannula , Minna Hirvonen , Juha Kontinen , Sebastian Link

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database…

数据库 · 计算机科学 2025-10-10 Peixian Ma , Xialie Zhuang , Chengjin Xu , Xuhui Jiang , Ran Chen , Jian Guo

Continuous Subgraph Matching via Cost-Model-based Dynamic Vertex Dominance Embeddings (Technical Report)

In many real-world applications such as social network analysis, knowledge graph discovery, biological network analytics, and so on, graph data management has become increasingly important and has drawn much attention from the database…

数据库 · 计算机科学 2025-10-10 Yutong Ye , Xiang Lian , Nan Zhang , Mingsong Chen

On the Expressiveness of Languages for Querying Property Graphs in Relational Databases

SQL/PGQ is the emerging ISO standard for querying property graphs defined as views over relational data. We formalize its expressive power across three fragments: the read-only core, the read-write extension, and an extended variant with…

数据库 · 计算机科学 2025-10-09 Hadar Rotschield , Liat Peterfreund

Bridging Imperative Process Models and Process Data Queries-Translation and Relaxation

Business process management is increasingly practiced using data-driven approaches. Still, classical imperative process models, which are typically formalized using Petri nets, are not straightforwardly applicable to the relational…

数据库 · 计算机科学 2025-10-09 Abdur Rehman Anwar Qureshi , Adrian Rebmann , Timotheus Kampik , Matthias Weidlich , Mathias Weske

FastER: On-Demand Entity Resolution in Property Graphs

Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high…

数据库 · 计算机科学 2025-10-09 Shujing Wang , Sibo Zhao , Shiqi Miao , Selasi Kwashie , Michael Bewong , Junwei Hu , Vincent M. Nofong , Zaiwen Feng

Speeding up SQL subqueries via decoupling of non-correlated predicate (extended version)

In this paper, we discuss a novel technique for processing correlated subqueries in SQL. The core idea is to isolate the non-correlated part of the predicate and use it to reduce the number of evaluations of the correlated part. We begin by…

数据库 · 计算机科学 2025-10-08 Dmitrii Radivonchik , Yakov Kuzin , Anton Chizhov , Dmitriy Shcheka , Mikhail Firsov , Kirill Smirnov , George Chernishev

Redefining Cost Estimation in Database Systems: The Role of Execution Plan Features and Machine Learning

Accurate query runtime prediction is a critical component of effective query optimization in modern database systems. Traditional cost models, such as those used in PostgreSQL, rely on static heuristics that often fail to reflect actual…

数据库 · 计算机科学 2025-10-08 Utsav Pathak , Amit Mankodi

Ambidextrous Degree Sequence Bounds for Pessimistic Cardinality Estimation

In a large database system, upper-bounding the cardinality of a join query is a crucial task called $\textit{pessimistic cardinality estimation}$. Recently, Abo Khamis, Nakos, Olteanu, and Suciu unified related works into the following…

数据库 · 计算机科学 2025-10-07 Yu-Ting Lin , Hsin-Po Wang

Is it Bigger than a Breadbox: Efficient Cardinality Estimation for Real World Workloads

DB engines produce efficient query execution plans by relying on cost models. Practical implementations estimate cardinality of queries using heuristics, with magic numbers tuned to improve average performance on benchmarks. Empirically,…

数据库 · 计算机科学 2025-10-07 Zixuan Yi , Sami Abu-el-Haija , Yawen Wang , Teja Vemparala , Yannis Chronis , Yu Gan , Michael Burrows , Carsten Binnig , Bryan Perozzi , Ryan Marcus , Fatma Ozcan

A New Normalization Form for Limited Distinct Attributes

In modern databases, the practice of data normalization continues to be important in improving data integrity, minimizing redundancies, and eliminating anomalies. However, since its inception and consequent improvements, there have been no…

数据库 · 计算机科学 2025-10-06 Niko S. Snell , Rayen C. Lee

Galley: Modern Query Optimization for Sparse Tensor Programs

The tensor programming abstraction is a foundational paradigm which allows users to write high performance programs via a high-level imperative interface. Recent work on sparse tensor compilers has extended this paradigm to sparse tensors…

数据库 · 计算机科学 2025-10-06 Kyle Deeds , Willow Ahrens , Magda Balazinska , Dan Suciu

EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases

Machine learning models for clinical prediction rely on structured data extracted from Electronic Medical Records (EMRs), yet this process remains dominated by hardcoded, database-specific pipelines for cohort definition, feature selection,…

数据库 · 计算机科学 2025-10-03 Kwanhyung Lee , Sungsoo Hong , Joonhyung Park , Jeonghyeop Lim , Juhwan Choi , Donghwee Yoon , Eunho Yang

GeoSQL-Eval: First Evaluation of LLMs on PostGIS-Based NL2GeoSQL Queries

Large language models (LLMs) have shown strong performance in natural language to SQL (NL2SQL) tasks within general databases. However, extending to GeoSQL introduces additional complexity from spatial data types, function invocation, and…

数据库 · 计算机科学 2025-10-03 Shuyang Hou , Haoyue Jiao , Ziqi Liu , Lutong Xie , Guanyu Chen , Shaowen Wu , Xuefeng Guan , Huayi Wu

Data Quality Taxonomy for Data Monetization

This chapter presents a comprehensive taxonomy for assessing data quality in the context of data monetisation, developed through a systematic literature review. Organising over one hundred metrics and Key Performance Indicators (KPIs) into…

数据库 · 计算机科学 2025-10-02 Eduardo Vyhmeister , Bastien Pietropoli , Andrea Visentin

Lost Data in Electron Microscopy

The goal of this study is to estimate the amount of lost data in electron microscopy and to analyze the extent to which experimentally acquired images are utilized in peer-reviewed scientific publications. Analysis of the number of images…

数据库 · 计算机科学 2025-10-02 Nina M. Ivanova , Alexey S. Kashin , Valentine P. Ananikov

The Grammar of FAIR: A Granular Architecture of Semantic Units for FAIR Semantics, Inspired by Biology and Linguistics

The FAIR Principles aim to make data and knowledge Findable, Accessible, Interoperable, and Reusable, yet current digital infrastructures often lack a unifying semantic framework that bridges human cognition and machine-actionability. In…

数据库 · 计算机科学 2025-10-01 Lars Vogt , Barend Mons

Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science

This paper introduces Experiversum, a lakehouse-based ecosystem that supports the curation, documentation and reproducibility of exploratory experiments. Experiversum enables structured research through iterative data cycles, while…

数据库 · 计算机科学 2025-10-01 Genoveva Vargas-Solar , Umberto Costa , Jérôme Darmont , Javier Espinosa-Oviedo , Carmem Hara , Sabine Loudcher , Regina Motz , Martin A. Musicante , José-Luis Zechinelli-Martini

PAT: Pattern-Perceptive Transformer for Error Detection in Relational Databases

Error detection in relational databases is critical for maintaining data quality and is fundamental to tasks such as data cleaning and assessment. Current error detection studies mostly employ the multi-detector approach to handle…

数据库 · 计算机科学 2025-10-01 Jian Fu , Xixian Han , Xiaolong Wan , Wenjian Wang