数据库 — Scifaro

Quality of Descriptive Information on Cultural Heritage Objects: Definition and Empirical Evaluation

Effective data processing depends on the quality of the underlying data. However, quality issues such as inconsistencies and uncertainties, can significantly impede the processing and subsequent use of data. Despite the centrality of data…

数据库 · 计算机科学 2026-02-26 Markus Matoni , Arno Kesper , Gabriele Taentzer

BuffCut: Prioritized Buffered Streaming Graph Partitioning

Streaming graph partitioners enable resource-efficient and massively scalable partitioning, but one-pass assignment heuristics are highly sensitive to stream order and often yield substantially higher edge cuts than in-memory methods. We…

数据库 · 计算机科学 2026-02-26 Linus Baumgärtner , Adil Chhabra , Marcelo Fonseca Faraj , Christian Schulz

Premature Dimensional Collapse and Tensor-based Execution Paths for High-Dimensional Relational Operations in Cost-Based Database Systems

Modern cost-based DBMSs frequently exhibit execution instability and tail-latency amplification when high-dimensional relational operations trigger memory-regime transitions such as hash-table spilling and external materialization. We…

数据库 · 计算机科学 2026-02-26 Il-Sun Chang

Topological Relational Theory: A Simplicial-Complex View of Functional Dependencies, Lossless Decomposition, and Acyclicity

We develop a topological lens on relational schema design by encoding functional dependencies (FDs) as simplices of an abstract simplicial complex. This dependency complex exposes multi-attribute interactions and enables homological…

数据库 · 计算机科学 2026-02-26 Bilge Senturk , Faruk Alpay

Fast Private Adaptive Query Answering for Large Data Domains

Privately releasing marginals of a tabular dataset is a foundational problem in differential privacy. However, state-of-the-art mechanisms suffer from a computational bottleneck when marginal estimates are reconstructed from noisy…

数据库 · 计算机科学 2026-02-26 Miguel Fuentes , Brett Mullins , Yingtai Xiao , Daniel Kifer , Cameron Musco , Daniel Sheldon

From RDF Graph Validation to RDF Dataset Validation with SHACL-DS

The Shapes Constraint Language (SHACL) is the W3C Recommendation for validating a single RDF graph. This makes SHACL inadequate for validating data across (named) graphs in an RDF dataset. Existing workarounds, such as graph unions or…

数据库 · 计算机科学 2026-02-26 Davan Chiem Dao , Christophe Debruyne

High-Fidelity And Complex Test Data Generation For Google SQL Code Generation Services

The demand for high-fidelity test data is paramount in industrial settings where access to production data is largely restricted. Traditional data generation methods often fall short, struggling with low-fidelity and the ability to model…

数据库 · 计算机科学 2026-02-26 Shivasankari Kannan , Yeounoh Chung , Amita Gondi , Tristan Swadell , Fatma Ozcan

RISK: Efficiently processing rich spatial-keyword queries on encrypted geo-textual data

Symmetric searchable encryption (SSE) for geo-textual data has attracted significant attention. However, existing schemes rely on task-specific, incompatible indices for isolated specific secure queries (e.g., range or k-nearest neighbor…

数据库 · 计算机科学 2026-02-25 Zhen Lv , Cong Cao , Hongwei Huo , Jiangtao Cui , Yanguo Peng , Hui Li , Yingfan Liu

cuRPQ: A High-Performance GPU-Based Framework for Processing Regular and Conjunctive Regular Path Queries

Regular path queries (RPQs) are fundamental for path-constrained reachability analysis, and more complex variants such as conjunctive regular path queries (CRPQs) are increasingly used in graph analytics. Evaluating these queries is…

数据库 · 计算机科学 2026-02-25 Sungwoo Park , Seohyeon Kim , Min-Soo Kim

A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

The rapid advancement of large language models (LLMs) has spurred the emergence of data agents, autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent"…

数据库 · 计算机科学 2026-02-25 Yizhang Zhu , Liangwei Wang , Chenyu Yang , Xiaotian Lin , Boyan Li , Wei Zhou , Xinyu Liu , Zhangyang Peng , Tianqi Luo , Yu Li , Chengliang Chai , Chong Chen , Shimin Di , Ju Fan , Ji Sun , Nan Tang , Fugee Tsung , Jiannan Wang , Chenglin Wu , Yanwei Xu , Shaolei Zhang , Yong Zhang , Xuanhe Zhou , Guoliang Li , Yuyu Luo

A Context-Aware Knowledge Graph Platform for Stream Processing in Industrial IoT

Industrial IoT ecosystems bring together sensors, machines and smart devices operating collaboratively across industrial environments. These systems generate large volumes of heterogeneous, high-velocity data streams that require…

数据库 · 计算机科学 2026-02-24 Monica Marconi Sciarroni , Emanuele Storti

Semantic Caching for OLAP via LLM-Based Query Canonicalization (Extended Version)

Analytical workloads exhibit substantial semantic repetition, yet most production caches key entries by SQL surface form (text or AST), fragmenting reuse across BI tools, notebooks, and NL interfaces. We introduce a safety-first middleware…

数据库 · 计算机科学 2026-02-24 Laurent Bindschaedler

The Climate Change Knowledge Graph: Supporting Climate Services

Climate change impacts a broad spectrum of human resources and activities, necessitating the use of climate models to project long-term effects and inform mitigation and adaptation strategies. These models generate multiple datasets by…

数据库 · 计算机科学 2026-02-24 Miguel Ceriani , Fiorela Ciroku , Alessandro Russo , Massimiliano Schembri , Fai Fung , Neha Mittal , Vito Trianni , Andrea Giovanni Nuzzolese

Breaking the Barriers of Database-Agnostic Transactions

Federated transaction management has long been used as a method to virtually integrate multiple databases from a transactional perspective, ensuring consistency across the databases. Modern approaches manage transactions on top of a…

数据库 · 计算机科学 2026-02-24 Toshihiro Suzuki , Hiroyuki Yamada

PIPE-RDF: An LLM-Assisted Pipeline for Enterprise RDF Benchmarking

Enterprises rely on RDF knowledge graphs and SPARQL to expose operational data through natural language interfaces, yet public KGQA benchmarks do not reflect proprietary schemas, prefixes, or query distributions. We present PIPE-RDF, a…

数据库 · 计算机科学 2026-02-24 Suraj Ranganath

RDBLearn: Simple In-Context Prediction Over Relational Databases

Recent advances in tabular in-context learning (ICL) show that a single pretrained model can adapt to new prediction tasks from a small set of labeled examples, avoiding per-task training and heavy tuning. However, many real-world tasks…

数据库 · 计算机科学 2026-02-24 Yanlin Zhang , Linjie Xu , Quan Gan , David Wipf , Minjie Wang

Vibe Coding on Trial: Operating Characteristics of Unanimous LLM Juries

Large Language Models (LLMs) are now good enough at coding that developers can describe intent in plain language and let the tool produce the first code draft, a workflow increasingly built into tools like GitHub Copilot, Cursor, and…

数据库 · 计算机科学 2026-02-24 Muhammad Aziz Ullah , Abdul Serwadda

SQL-Exchange: Transforming SQL Queries Across Domains

We introduce SQL-Exchange, a framework for mapping SQL queries across different database schemas by preserving the source query structure while adapting domain-specific elements to align with the target schema. We investigate the conditions…

数据库 · 计算机科学 2026-02-24 Mohammadreza Daviran , Brian Lin , Davood Rafiei

Bigger Is Not Better: The Fastest Static GPU Index Is Also Lightweight!

Sorting and binary searching a dense array can be considered the simplest and most space efficient form of indexing. This holds especially on GPUs as they exhibit exceptional sorting performance. However, the popular opinion is that such a…

数据库 · 计算机科学 2026-02-24 Justus Henneberg , Felix Schuhknecht

Survey: Graph Databases

Graph databases have become essential tools for managing complex and interconnected data, which is common in areas like social networks, bioinformatics, and recommendation systems. Unlike traditional relational databases, graph databases…

数据库 · 计算机科学 2026-02-24 Miguel E. Coimbra , Lucie Svitáková , Domagoj Vrgoč , Alexandre P. Francisco , Luís Veiga