数据库 — Scifaro

Accelerating Graph Similarity Search through Integer Linear Programming

The Graph Edit Distance (GED) is an important metric for measuring the similarity between two (labeled) graphs. It is defined as the minimum cost required to convert one graph into another through a series of (elementary) edit operations.…

数据库 · 计算机科学 2025-11-05 Andrea D'Ascenzo , Julian Meffert , Petra Mutzel , Fabrizio Rossi

Numbering Combinations for Compact Representation of Many-to-Many Relationship Sets

In this paper we propose an approach to implement specific relation-ship set between two entities called combinatorial relationship set. For the combinatorial relationship set B between entity sets G and I the mapping cardinality is…

数据库 · 计算机科学 2025-11-05 Savo Tomovic

Vortex: Hosting ML Inference and Knowledge Retrieval Services With Tight Latency and Throughput Requirements

There is growing interest in deploying ML inference and knowledge retrieval as services that could support both interactive queries by end users and more demanding request flows that arise from AIs integrated into a end-user applications…

数据库 · 计算机科学 2025-11-05 Yuting Yang , Tiancheng Yuan , Jamal Hashim , Thiago Garrett , Jeffrey Qian , Ann Zhang , Yifan Wang , Weijia Song , Ken Birman

InteracSPARQL: An Interactive System for SPARQL Query Refinement Using Natural Language Explanations

In recent years, querying semantic web data using SPARQL has remained challenging, especially for non-expert users, due to the language's complex syntax and the prerequisite of understanding intricate data structures. To address these…

数据库 · 计算机科学 2025-11-05 Xiangru Jian , Zhengyuan Dong , M. Tamer Özsu

An Experimental Comparison of Alternative Techniques for Event-Log Augmentation

Process mining analyzes and improves processes by examining transactional data stored in event logs, which record sequences of events with timestamps. However, the effectiveness of process mining, especially when combined with machine or…

数据库 · 计算机科学 2025-11-05 Alessandro Padella , Francesco Vinci , Massimiliano de Leoni

ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL

Large Language Models (LLMs) have demonstrated remarkable progress in translating natural language to SQL, but a significant semantic gap persists between their general knowledge and domain-specific semantics of databases. Historical…

数据库 · 计算机科学 2025-11-05 Yiwen Jiao , Tonghui Ren , Yuche Gao , Zhenying He , Yinan Jing , Kai Zhang , X. Sean Wang

UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data

In the real business world, data is stored in a variety of sources, including structured relational databases, unstructured databases (e.g., NoSQL databases), or even CSV/excel files. The ability to extract reasonable insights across these…

数据库 · 计算机科学 2025-11-04 Han Weng , Zhou Liu , Yuanfeng Song , Xiaoming Yin , Xing Chen , Wentao Zhang

Fast Answering Pattern-Constrained Reachability Queries with Two-Dimensional Reachability Index

Reachability queries ask whether there exists a path from the source vertex to the target vertex on a graph. Recently, several powerful reachability queries, such as Label-Constrained Reachability (LCR) queries and Regular Path Queries…

数据库 · 计算机科学 2025-11-04 Huihui Yang , Pingpeng Yuan

PathFinder: Efficiently Supporting Conjunctions and Disjunctions for Filtered Approximate Nearest Neighbor Search

Filtered approximate nearest neighbor search (ANNS) restricts the search to data objects whose attributes satisfy a given filter and retrieves the top-$K$ objects that are most semantically similar to the query object. Many graph-based ANNS…

数据库 · 计算机科学 2025-11-04 Tianming Wu , Dixin Tang

Efficient Query Repair for Aggregate Constraints

In many real-world scenarios, query results must satisfy domain-specific constraints. For instance, a minimum percentage of interview candidates selected based on their qualifications should be female. These requirements can be expressed as…

数据库 · 计算机科学 2025-11-04 Shatha Algarni , Boris Glavic , Seokki Lee , Adriane Chapman

Finding Non-Redundant Simpson's Paradox from Multidimensional Data

Simpson's paradox, a long-standing statistical phenomenon, describes the reversal of an observed association when data are disaggregated into sub-populations. It has critical implications across statistics, epidemiology, economics, and…

数据库 · 计算机科学 2025-11-04 Yi Yang , Jian Pei , Jun Yang , Jichun Xie

Object-Centric Analysis of XES Event Logs: Integrating OCED Modeling with SPARQL Queries

Object Centric Event Data (OCED) has gained attention in recent years within the field of process mining. However, there are still many challenges, such as connecting the XES format to object-centric approaches to enable more insightful…

数据库 · 计算机科学 2025-11-04 Saba Latif , Huma Latif , Muhammad Rameez Ur Rahman

Embedding based Encoding Scheme for Privacy Preserving Record Linkage

To discover new insights from data, there is a growing need to share information that is often held by different organisations. One key task in data integration is the calculation of similarities between records in different databases to…

数据库 · 计算机科学 2025-11-04 Sirintra Vaiwsri , Thilina Ranbaduge

Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search

Hybrid search, the integration of lexical and semantic retrieval, has become a cornerstone of modern information retrieval systems, driven by demanding applications like Retrieval-Augmented Generation (RAG). The architectural design space…

数据库 · 计算机科学 2025-11-04 Mengzhao Wang , Boyu Tan , Yunjun Gao , Hai Jin , Yingfeng Zhang , Xiangyu Ke , Xiaoliang Xu , Yifan Zhu

ACTIVE: Continuous Similarity Search for Vessel Trajectories

Publicly available vessel trajectory data is emitted continuously from the global AIS system. Continuous trajectory similarity search on this data has applications in, e.g., maritime navigation and safety. Existing proposals typically…

数据库 · 计算机科学 2025-11-04 Tiantian Liu , Hengyu Liu , Tianyi Li , Kristian Torp , Christian S. Jensen

Intermediate Relation Size Bounds for Select-Project-Join-Union Query Plans

We study the problem of statically optimizing select-project-join-union (SPJU) plans where unary key constraints are allowed. A natural measure of a plan, which we call the output degree and which has been studied previously, is the minimum…

数据库 · 计算机科学 2025-11-04 Hubie Chen , Markus Schneider

Approximate Diverse $k$-nearest Neighbor Search in Vector Database

Approximate $k$-nearest neighbor search (A$k$-NNS) is a core operation in vector databases, underpinning applications such as retrieval-augmented generation (RAG) and image retrieval. In these scenarios, users often prefer diverse result…

数据库 · 计算机科学 2025-11-03 Jiachen Zhao , Xiao Yan , Eric Lo

DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

Manually conducting real-world data analyses is labor-intensive and inefficient. Despite numerous attempts to automate data science workflows, none of the existing paradigms or systems fully demonstrate all three key capabilities required…

数据库 · 计算机科学 2025-11-03 Chuxuan Hu , Maxwell Yang , James Weiland , Yeji Lim , Suhas Palawala , Daniel Kang

ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction

Automated data preparation pipeline construction is critical for machine learning success, yet existing methods suffer from two fundamental limitations: they treat pipeline construction as black-box optimization without quantifying…

数据库 · 计算机科学 2025-11-03 Jing Chang , Chang Liu , Jinbin Huang , Shuyuan Zheng , Rui Mao , Jianbin Qin

Unstructured Data Analysis using LLMs: A Comprehensive Benchmark

Nowadays, the explosion of unstructured data presents immense analytical value. Leveraging the remarkable capability of large language models (LLMs) in extracting attributes of structured tables from unstructured data, researchers are…

数据库 · 计算机科学 2025-11-03 Qiyan Deng , Jianhui Li , Chengliang Chai , Jinqi Liu , Junzhi She , Kaisen Jin , Zhaoze Sun , Yuhao Deng , Jia Yuan , Ye Yuan , Guoren Wang , Lei Cao