数据库 — Scifaro

Infinite Stream Estimation under Personalized $w$-Event Privacy

Streaming data collection is indispensable for stream data analysis, such as event monitoring. However, publishing these data directly leads to privacy leaks. $w$-event privacy is a valuable tool to protect individual privacy within a given…

数据库 · 计算机科学 2025-09-11 Leilei Du , Peng Cheng , Lei Chen , Heng Tao Shen , Xuemin Lin , Wei Xi

Polyglot Persistence in Microservices: Managing Data Diversity in Distributed Systems

Microservices architectures have become the foundation for developing scalable and modern software systems, but they also bring significant challenges in managing heterogeneous and distributed data. The pragmatic solution is polyglot…

数据库 · 计算机科学 2025-09-11 Festim Halili , Anila Nuhiji , Diellza Mustafai Veliu

Proving correctness for SQL implementations of OCL constraints

In the context of the model-driven development of data-centric applications, OCL constraints play a major role in adding precision to the source models (e.g., data models and security models). Several code-generators have been proposed to…

数据库 · 计算机科学 2025-09-11 Hoang Nguyen , Manuel Clavel

Filtered Approximate Nearest Neighbor Search: A Unified Benchmark and Systematic Experimental Study [Experiment, Analysis & Benchmark]

For a given dataset $\mathcal{D}$ and structured label $f$, the goal of Filtered Approximate Nearest Neighbor Search (FANNS) algorithms is to find top-$k$ points closest to a query that satisfy label constraints, while ensuring both recall…

数据库 · 计算机科学 2025-09-10 Jiayang Shi , Yuzheng Cai , Weiguo Zheng

JOINT: Join Optimization and Inference via Network Traversal

Traditional relational databases require users to manually specify join keys and assume exact matches between column names and values. In practice, this limits joinability across fragmented or inconsistently named tables. We propose a fuzzy…

数据库 · 计算机科学 2025-09-10 Szu-Yun Ko , Ethan Chen , Bo-Cian Chang , Alan Shu-Luen Chang

Private Queries with Sigma-Counting

Many data applications involve counting queries, where a client specifies a feasible range of variables and a database returns the corresponding item counts. A program that produces the counts of different queries often risks leaking…

数据库 · 计算机科学 2025-09-10 Jun Gao , Jie Ding

Navigating the Data Space Landscape: Concepts, Applications, and Future Directions

This paper explores the evolving landscape of data spaces, focusing on key concepts, practical applications, and emerging future directions. It begins by introducing the foundational principles that underpin data space architectures,…

数据库 · 计算机科学 2025-09-10 Bojana Marojevikj , Riste Stojanov

DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

Research on learned cardinality estimation has made significant progress in recent years. However, existing methods still face distinct challenges that hinder their practical deployment in production environments. We define these challenges…

数据库 · 计算机科学 2025-09-10 Kaixin Zhang , Hongzhi Wang , Ziqi Li , Yabin Lu , Yingze Li , Yu Yan , Yiming Guan

Relational Algebras for Subset Selection and Optimisation

The database community lacks a unified relational query language for subset selection and optimisation queries, limiting both user expression and query optimiser reasoning about such problems. Decades of research (latterly under the rubric…

数据库 · 计算机科学 2025-09-09 David Robert Pratten , Luke Mathieson , Fahimeh Ramezani

MCTuner: Spatial Decomposition-Enhanced Database Tuning via LLM-Guided Exploration

Database knob tuning is essential for optimizing the performance of modern database management systems, which often expose hundreds of knobs with continuous or categorical values. However, the large number of knobs and the vast…

数据库 · 计算机科学 2025-09-09 Zihan Yan , Rui Xi , Mengshu Hou

A Unified Framework for Cultural Heritage Data Historicity and Migration: The ARGUS Approach

Cultural heritage preservation faces significant challenges in managing diverse, multi-source, and multi-scale data for effective monitoring and conservation. This paper documents a comprehensive data historicity and migration framework…

数据库 · 计算机科学 2025-09-09 Lingxiao Kong , Apostolos Sarris , Miltiadis Polidorou , Victor Klingenberg , Vasilis Sevetlidis , Vasilis Arampatzakis , George Pavlidis , Cong Yang , Zeyd Boukhers

Computing Inconsistency Measures Under Differential Privacy

Assessing data quality is crucial to knowing whether and how to use the data for different purposes. Specifically, given a collection of integrity constraints, various ways have been proposed to quantify the inconsistency of a database.…

数据库 · 计算机科学 2025-09-09 Shubhankar Mohapatra , Amir Gilad , Xi He , Benny Kimelfeld

Efficient Exact Resistance Distance Computation on Small-Treewidth Graphs: a Labelling Approach

Resistance distance computation is a fundamental problem in graph analysis, yet existing random walk-based methods are limited to approximate solutions and suffer from poor efficiency on small-treewidth graphs (e.g., road networks). In…

数据库 · 计算机科学 2025-09-08 Meihao Liao , Yueyang Pan , Rong-Hua Li , Guoren Wang

Global Hash Tables Strike Back! An Analysis of Parallel GROUP BY Aggregation

Efficiently computing group aggregations (i.e., GROUP BY) on modern architectures is critical for analytic database systems. Hash-based approaches in today's engines predominantly use a partitioned approach, in which incoming data is…

数据库 · 计算机科学 2025-09-08 Daniel Xue , Ryan Marcus

An Evaluation of N-Gram Selection Strategies for Regular Expression Indexing in Contemporary Text Analysis Tasks. Extended Version

Efficient evaluation of regular expressions (regex, for short) is crucial for text analysis, and n-gram indexes are fundamental to achieving fast regex evaluation performance. However, these indexes face scalability challenges because of…

数据库 · 计算机科学 2025-09-08 Ling Zhang , Shaleen Deep , Jignesh M. Patel , Karthikeyan Sankaralingam

The KG-ER Conceptual Schema Language

We propose KG-ER, a conceptual schema language for knowledge graphs that describes the structure of knowledge graphs independently of their representation (relational databases, property graphs, RDF) while helping to capture the semantics…

数据库 · 计算机科学 2025-09-05 Enrico Franconi , Benoît Groz , Jan Hidders , Nina Pardal , Sławek Staworko , Jan Van den Bussche , Piotr Wieczorek

GPU Acceleration of SQL Analytics on Compressed Data

GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain…

数据库 · 计算机科学 2025-09-05 Zezhou Huang , Krystian Sakowski , Hans Lehnert , Wei Cui , Carlo Curino , Matteo Interlandi , Marius Dumitru , Rathijit Sen

Text2Cypher: Data Pruning using Hard Example Selection

Database query languages such as SQL for relational databases and Cypher for graph databases have been widely adopted. Recent advancements in large language models (LLMs) enable natural language interactions with databases through models…

数据库 · 计算机科学 2025-09-05 Makbule Gulcin Ozsoy

Enhancing Text2Cypher with Schema Filtering

Knowledge graphs represent complex data using nodes, relationships, and properties. Cypher, a powerful query language for graph databases, enables efficient modeling and querying. Recent advancements in large language models allow…

数据库 · 计算机科学 2025-09-05 Makbule Gulcin Ozsoy

Adaptive KV-Cache Compression without Manually Setting Budget

Large language models (LLMs) inference relies heavily on KV-caches to accelerate autoregressive decoding, but the resulting memory footprint grows rapidly with sequence length, posing significant efficiency challenges. Current KV-cache…

数据库 · 计算机科学 2025-09-04 Chenxia Tang , Jianchun Liu , Hongli Xu , Liusheng Huang