数据库 — Scifaro

Performant Synchronization in Geo-Distributed Databases

The deployment of databases across geographically distributed regions has become increasingly critical for ensuring data reliability and scalability. Recent studies indicate that distributed databases exhibit significantly higher latency…

数据库 · 计算机科学 2025-12-19 Duling Xu , Tong Li , Zegang Sun , Zheng Chen , Weixing Zhou , Yanfeng Zhang , Wei Lu , Xiaoyong Du

AutoDDG: Automated Dataset Description Generation using Large Language Models

The proliferation of datasets across open data portals and enterprise data lakes presents an opportunity for deriving data-driven insights. Widely-used dataset search systems rely on keyword search over dataset metadata, including…

数据库 · 计算机科学 2025-12-19 Haoxiang Zhang , Yurong Liu , Aécio Santos , Wei-Lun Hung , Juliana Freire

ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata

Traditional search applications within Research Data Management (RDM) ecosystems are crucial in helping users discover and explore the structured metadata from the research datasets. Typically, text search engines require users to submit…

数据库 · 计算机科学 2025-12-18 Gajendra Doniparthi , Shashank Balu Pandhare , Stefan Deßloch , Timo Mühlhaus

Revisiting Task-Oriented Dataset Search in the Era of Large Language Models: Challenges, Benchmark, and Solution

The search for suitable datasets is the critical "first step" in data-driven research, but it remains a great challenge. Researchers often need to search for datasets based on high-level task descriptions. However, existing search systems…

数据库 · 计算机科学 2025-12-18 Zixin Wei , Yucan Guo , Jinyang Li , Xiaolin Han , Xiaolong Jin , Chenhao Ma

Graph Pattern-based Association Rules Evaluated Under No-repeated-anything Semantics in the Graph Transactional Setting

We introduce graph pattern-based association rules (GPARs) for directed labeled multigraphs such as RDF graphs. GPARs support both generative tasks, where a graph is extended, and evaluative tasks, where the plausibility of a graph is…

数据库 · 计算机科学 2025-12-18 Basil Ell

Extracting node comparison insights for the interactive exploration of property graphs

While scoring nodes in graphs to understand their importance (e.g., in terms of centrality) has been investigated for decades, comparing nodes in property graphs based on their properties has not, to our knowledge, yet been addressed. In…

数据库 · 计算机科学 2025-12-18 Cristina Aguiar , Jacques Chabin , Alexandre Chanson , Mirian Halfeld-Ferrari , Nicolas Hiot , Nicolas Labroche , Patrick Marcel , Verónika Peralta , Felipe Vasconcelos

MS-Index: Fast Top-k Subsequence Search for Multivariate Time Series under Euclidean Distance

Modern applications frequently collect and analyze temporal data in the form of multivariate time series (MTS) -- time series that contain multiple channels. A common task in this context is subsequence search, which involves identifying…

数据库 · 计算机科学 2025-12-18 Jens E. d'Hondt , Teun Kortekaas , Odysseas Papapetrou , Themis Palpanas

Stress-Testing Causal Claims via Cardinality Repairs

Causal analyses derived from observational data underpin high-stakes decisions in domains such as healthcare, public policy, and economics. Yet such conclusions can be surprisingly fragile: even minor data errors - duplicate records, or…

数据库 · 计算机科学 2025-12-18 Yarden Gabbay , Haoquan Guan , Shaull Almagor , El Kindi Rezig , Brit Youngmann , Babak Salimi

Downsizing Diffusion Models for Cardinality Estimation

Learned cardinality estimation requires accurate model designs to capture the local characteristics of probability distributions. However, existing models may fail to accurately capture complex, multilateral dependencies between attributes.…

数据库 · 计算机科学 2025-12-18 Xinhe Mu , Zhaoqi Zhou , Zaijiu Shang , Chuan Zhou , Gang Fu , Guiying Yan , Guoliang Li , Zhiming Ma

ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data

We present the data model, design choices, and performance of ProvSQL, a general and easy-to-deploy provenance tracking and probabilistic database system implemented as a PostgreSQL extension. ProvSQL's data and query models closely reflect…

数据库 · 计算机科学 2025-12-18 Aryak Sen , Silviu Maniu , Pierre Senellart

Chase Anonymisation: Privacy-Preserving Knowledge Graphs with Logical Reasoning

We propose a novel framework to enable Knowledge Graphs (KGs) sharing while ensuring that information that should remain private is not directly released nor indirectly exposed via derived knowledge, maintaining at the same time the…

数据库 · 计算机科学 2025-12-17 Luigi Bellomarini , Costanza Catalano , Andrea Coletta , Michela Iezzi , Pierangela Samarati

Approximating Queries on Probabilistic Graphs

Query evaluation over probabilistic databases is notoriously intractable -- not only in combined complexity, but often in data complexity as well. This motivates the study of approximation algorithms, and particularly of combined FPRASes,…

数据库 · 计算机科学 2025-12-17 Antoine Amarilli , Timothy van Bremen , Octave Gaspard , Kuldeep S. Meel

Database Research needs an Abstract Relational Query Language

For decades, SQL has been the default language for composing queries, but it is increasingly used as an artifact to be read and verified rather than authored. With Large Language Models (LLMs), queries are increasingly machine-generated,…

数据库 · 计算机科学 2025-12-16 Wolfgang Gatterbauer , Diandre Miguel Sabale

CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF

Cardinality estimation (CE), the task of predicting the result size of queries is a critical component of query optimization. Accurate estimates are essential for generating efficient query execution plans. Recently, machine learning…

数据库 · 计算机科学 2025-12-16 Lankadinee Rathuwadu , Guanli Liu , Christopher Leckie , Renata Borovica-Gajic

NeurIDA: Dynamic Modeling for Effective In-Database Analytics

Relational Database Management Systems (RDBMS) manage complex, interrelated data and support a broad spectrum of analytical tasks. With the growing demand for predictive analytics, the deep integration of machine learning (ML) into RDBMS…

数据库 · 计算机科学 2025-12-16 Lingze Zeng , Naili Xing , Shaofeng Cai , Peng Lu , Gang Chen , Jian Pei , Beng Chin Ooi

Updatable Balanced Index for Fast On-device Search with Auto-selection Model

Diverse types of edge data, such as 2D geo-locations and 3D point clouds, are collected by sensors like lidar and GPS receivers on edge devices. On-device searches, such as k-nearest neighbor (kNN) search and radius search, are commonly…

数据库 · 计算机科学 2025-12-16 Yushuai Ji , Sheng Wang , Zhiyu Chen , Yuan Sun , Zhiyong Peng

Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving

Increasing demand for Large Language Models (LLMs) services imposes substantial deployment and computation costs on providers. LLM routing offers a cost-efficient solution by directing queries to the optimal LLM based on model and query…

数据库 · 计算机科学 2025-12-16 Fangzhou Wu , Sandeep Silwal

Meta-Property Graphs: Extending Property Graphs with Metadata Awareness and Reification

The ISO standard Property Graph model has become increasingly popular for representing complex, interconnected data. However, it lacks native support for querying metadata and reification, which limits its abilities to deal with the demands…

数据库 · 计算机科学 2025-12-16 Sepehr Sadoughi , Nikolay Yakovets , George Fletcher

IRG: Modular Synthetic Relational Database Generation with Complex Relational Schemas

Relational databases (RDBs) are widely used by corporations and governments to store multiple related tables. Their relational schemas pose unique challenges to synthetic data generation for privacy-preserving data sharing, e.g., for…

数据库 · 计算机科学 2025-12-16 Jiayu Li , Zilong Zhao , Milad Abdollahzadeh , Biplab Sikdar , Y. C. Tay

Bridging Textual Data and Conceptual Models: A Model-Agnostic Structuring Approach

We introduce an automated method for structuring textual data into a model-agnostic schema, enabling alignment with any database model. It generates both a schema and its instance. Initially, textual data is represented as semantically…

数据库 · 计算机科学 2025-12-15 Jacques Chabin , Mirian Halfeld Ferrari , Nicolas Hiot