数据库
Effective data discovery is a cornerstone of modern data-driven decision-making. Yet, identifying datasets with specific distributional characteristics, such as percentiles or preferences, remains challenging. While recent proposals have…
After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid…
Synthetic data offers a promising solution to privacy concerns in healthcare by generating useful datasets in a privacy-aware manner. However, although synthetic data is typically developed with the intention of sharing said data, ambiguous…
Process mining has grown popular today given their ability to provide managers with insights into the actual business process as executed by employees. Process mining depends on event logs found in process aware information systems to model…
We investigate the evaluation of conjunctive queries over static and dynamic relations. While static relations are given as input and do not change, dynamic relations are subject to inserts and deletes. We characterise syntactically three…
Surrogate keys are now extensively utilized by database designers to implement keys in SQL tables. They are straightforward, easy to understand, and enable efficient access, despite lacking any real-world semantic meaning. In this context,…
Combining query answering and data science workloads has become prevalent. An important class of such workloads is top-k queries with a scoring function implemented as an opaque UDF - a black box whose internal structure and scores on the…
The Cognitive Data Model (CDM) is proposed. A novel approach to database design, inspired by the belief that the human brain operates with a logical data model independent of its anatomical structure. The study aims to identify and…
One fundamental question in database theory is the following: Given a Boolean conjunctive query Q, what is the best complexity for computing the answer to Q in terms of the input database size N? When restricted to the class of…
Entity Alignment (EA) aims to detect descriptions of the same real-world entities among different Knowledge Graphs (KG). Several embedding methods have been proposed to rank potentially matching entities of two KGs according to their…
In the age of big data, it is important for primary research data to follow the FAIR principles of findability, accessibility, interoperability, and reusability. Data harmonization enhances interoperability and reusability by aligning…
Knowledge Graphs (KGs) are widely used in data-driven applications and downstream tasks, such as virtual assistants, recommendation systems, and semantic search. The accuracy of KGs directly impacts the reliability of the inferred knowledge…
Query optimizers are essential components of relational database management systems that directly impact query performance as they transform input queries into efficient execution plans. While users can obtain the final execution plan using…
Relational Keyword Search (R-KwS) systems enable naive/informal users to explore and retrieve information from relational databases without requiring schema knowledge or query-language proficiency. Although numerous R-KwS methods have been…
This study proposes a novel storage engine, SynchroStore, designed to address the inefficiency of update operations in columnar storage systems based on Log-Structured Merge Trees (LSM-Trees) under hybrid workload scenarios. While columnar…
We consider the problem of defining semantic metrics for relational database queries. Informally, a semantic query metric for a query language $L$ is a metric function $\delta:L\times L\to \mathbb{N}$ where $\delta(Q_1, Q_2)$ represents the…
Query Containment Problem (QCP) is one of the most fundamental decision problems in database query processing and optimization. Complexity of QCP for conjunctive queries (QCP-CQ) has been fully understood since 1970s. But, as Chaudhuri and…
Caching has the potential to be of significant benefit for accessing large language models (LLMs) due to their high latencies which typically range from a small number of seconds to well over a minute. Furthermore, many LLMs charge money…
Although they exist since more than ten years already, have attracted diverse implementations, and have been used successfully in a significant number of applications, declarative mapping languages for constructing knowledge graphs from…
In this demonstration, we present AnDB, an AI-native database that supports traditional OLTP workloads and innovative AI-driven tasks, enabling unified semantic analysis across structured and unstructured data. While structured data…