数据库 — Scifaro

Maintaining Queries under Updates Using Heavy-Light Partitioning of the Input Relations

We study the classical incremental view maintenance problem: Given a query and a database, maintain the query output under single-tuple updates (inserts or deletes) to the database such that the tuples in the query output can be enumerated…

数据库 · 计算机科学 2026-05-12 Mahmoud Abo-Khamis , Eden Chmielewski , Andrei Draghici , Ahmet Kara , Dan Olteanu

Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor

Index tuning is critical for the performance of modern database systems. Industrial index tuners, such as the Database Tuning Advisor (DTA) developed for Microsoft SQL Server, rely on the "what-if" API provided by the query optimizer to…

数据库 · 计算机科学 2026-05-12 Xiaoying Wang , Wentao Wu , Vivek Narasayya , Surajit Chaudhuri

Translating database mathematical schemes into relational database software applications with MatBase

We present a pseudocode algorithm for translating our (Elementary) Mathematical Data Model schemes into relational ones and associated sets of non-relational constraints, used by MatBase, our intelligent data and knowledge base management…

数据库 · 计算机科学 2026-05-12 Christian Mancas , Diana Christina Mancas

MatBase algorithm for translating (E)MDM schemes into E-R data models

This paper presents a pseudocode algorithm for translating (Elementary) Mathematical Data Model ((E)MDM) schemes into Entity-Relationship data models. We prove that this algorithm is linear, sound, complete, and semi-optimal. As an example,…

数据库 · 计算机科学 2026-05-12 Christian Mancas , Diana Christina Mancas

LDI: Localized Data Imputation for Text-Rich Tables

Missing values are pervasive in real-world tabular data and can significantly impair downstream analysis. Imputing them is especially challenging in text-rich tables, where dependencies are implicit, complex, and dispersed across long…

数据库 · 计算机科学 2026-05-12 Soroush Omidvartehrani , Davood Rafiei

Towards a theory of Fa\c{c}ade-X data access: satisfiability of SPARQL basic graph patterns

Data integration is the primary use case for knowledge graphs. However, integrated data are not typically graphs but come in different formats, for example, CSV, XML, or a relational database. Fa\c{c}ade-X is a recently proposed method for…

数据库 · 计算机科学 2026-05-11 Luigi Asprino , Enrico Daga

Low-Latency Out-of-Core ANN Search in High-Dimensional Space

In-memory graph-based approximate nearest neighbor (ANN) search has superior search performance but incurs significant memory footprint. Disk-based methods reduce memory usage but suffer from high disk access latency. A common challenge is…

数据库 · 计算机科学 2026-05-08 Ziwen Song , Bin Wang , Xiaochun Yang , Junhua Zhang

An Extensible and Verifiable Language for Query Rewrite Rules

Logical query plan rewriting transforms a relational database query into an equivalent but more efficient form and is crucial to the performance of database-backed applications. In existing systems, rewrite rules are typically implemented…

数据库 · 计算机科学 2026-05-08 Sicheng Pan , Shuxian Wang , Wesley Zheng , Zirong Zeng , Vijay Sharma , Alvin Cheung

Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation

Natural language interfaces to databases have gained popularity, yet the theoretical foundations for evaluating and designing these systems remain underdeveloped. We present QUEST (Query Understanding Evaluation through Semantic…

数据库 · 计算机科学 2026-05-08 Vicki Stover Hertzberg , Eduardo Valverde , Joyce C. Ho

AegisTS: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning

Multivariate time series (MTS) are frequently affected by co-occurring quality issues, such as missing values, outliers, and constraint violations, which significantly undermine downstream analytics. Existing cleaning approaches fix only a…

数据库 · 计算机科学 2026-05-08 Yuhan Shi , Yuanyuan Yao , Lu Chen , Mourad Khayati , Tianyi Li

U-HNSW: An Efficient Graph-based Solution to ANNS Under Universal Lp Metrics

Approximate nearest neighbor search under universal L_p metrics (ANNS-U-L_p) is an important and challenging research problem, as it requires answering queries under all possible p (0<p <= 2) values simultaneously without building an index…

数据库 · 计算机科学 2026-05-08 Huayi Wang , Jingfan Meng , Jun Xu

DatAasee -- A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake

Metadata management for distributed data sources is a long-standing but ever-growing problem. To counter this challenge in a research-data and library-oriented setting, this work constructs a data architecture, derived from the data-lake:…

数据库 · 计算机科学 2026-05-08 Christian Himpe

Efficient Cost-Based Rewrite in a Bottom-Up Optimizer

The query optimizer in a Database Management Systems (DBMS), translates declarative queries into efficient execution plans. Conventional bottom-up optimization consists of two main stages: Query Rewrite (QRW) and Cost-Based Optimization…

数据库 · 计算机科学 2026-05-07 Qi Cheng , Yang Sun , Weidong Yu , Danny Chen , Weicheng Wang , Chong Chen , Per-Ake Larson

Estimating Power-Law Exponent with Edge Differential Privacy

Many real-world graphs have degree distributions that are well approximated by a power-law, and the corresponding scaling parameter $\alpha$ provides a compact summary of that structure which is useful for graph analysis and system…

数据库 · 计算机科学 2026-05-07 Adam Tan , Mohamed Hefny , Keval Vora

A Graph-Native Approach to Normalization

In recent years, knowledge graphs (KGs) - in particular in the form of labeled property graphs (LPGs) - have become essential components in a broad range of applications. Although the absence of strict schemas for KGs facilitates structural…

数据库 · 计算机科学 2026-05-07 Johannes Schrott , Maxime Jakubowski , Katja Hose

CycleTrajectory: An End-to-End Pipeline for Enriching and Analyzing GPS Trajectories to Understand Cycling Behavior and Environment

Global positioning system (GPS) trajectories recorded by mobile phones or action cameras offer valuable insights into sustainable mobility, as they provide fine-scale spatial and temporal characteristics of individual travel. However, the…

数据库 · 计算机科学 2026-05-07 Meihui Wang , James Haworth , Ilya Ilyankou , Nicola Christie

Inconsistent Databases and Argumentation Frameworks with Collective Attacks

The connection between subset-maximal repairs for inconsistent databases involving various integrity constraints and acceptable sets of arguments within argumentation frameworks has recently drawn growing interest. In this paper, we…

数据库 · 计算机科学 2026-05-06 Yasir Mahmood , Jonni Virtema , Timon Barlag , Axel-Cyrille Ngonga Ngomo

ConRAD: Conformal Risk-Aware Neural Databases

Querying incomplete knowledge graphs with neural predictors is powerful but dangerous. Errors compound across multi-hop pipelines with no formal bound on the completeness of results. We introduce ConRAD, the first framework to enforce…

数据库 · 计算机科学 2026-05-06 Sonia Horchidan , Fabian Zeiher , Xiangyu Shi , Vasiliki Kalavri , Henrik Boström , Ioannis Kontoyiannis , Paris Carbone

In-memory Multidimensional Indexing Using the skd-tree

In this paper, we revisit the problem of indexing multi-dimensional data in memory for the efficient support of multi-dimensional range queries and nearest neighbor queries. This is a classic problem in main-memory databases, where there is…

数据库 · 计算机科学 2026-05-06 Achilleas Michalopoulos , Dimitrios Tsitsigkos , Nikos Mamoulis

FINER-SQL: Boosting Small Language Models for Text-to-SQL

Large language models have driven major advances in Text-to-SQL generation. However, they suffer from high computational cost, long latency, and data privacy concerns, which make them impractical for many real-world applications. A natural…

数据库 · 计算机科学 2026-05-06 Thanh Dat Hoang , Thanh Trung Huynh , Matthias Weidlich , Thanh Tam Nguyen , Tong Chen , Hongzhi Yin , Quoc Viet Hung Nguyen