数据库
The deployment of databases across geographically distributed regions has become increasingly critical for ensuring data reliability and scalability. Recent studies indicate that distributed databases exhibit significantly higher latency…
The proliferation of datasets across open data portals and enterprise data lakes presents an opportunity for deriving data-driven insights. Widely-used dataset search systems rely on keyword search over dataset metadata, including…
Traditional search applications within Research Data Management (RDM) ecosystems are crucial in helping users discover and explore the structured metadata from the research datasets. Typically, text search engines require users to submit…
The search for suitable datasets is the critical "first step" in data-driven research, but it remains a great challenge. Researchers often need to search for datasets based on high-level task descriptions. However, existing search systems…
We introduce graph pattern-based association rules (GPARs) for directed labeled multigraphs such as RDF graphs. GPARs support both generative tasks, where a graph is extended, and evaluative tasks, where the plausibility of a graph is…
While scoring nodes in graphs to understand their importance (e.g., in terms of centrality) has been investigated for decades, comparing nodes in property graphs based on their properties has not, to our knowledge, yet been addressed. In…
Modern applications frequently collect and analyze temporal data in the form of multivariate time series (MTS) -- time series that contain multiple channels. A common task in this context is subsequence search, which involves identifying…
Causal analyses derived from observational data underpin high-stakes decisions in domains such as healthcare, public policy, and economics. Yet such conclusions can be surprisingly fragile: even minor data errors - duplicate records, or…
Learned cardinality estimation requires accurate model designs to capture the local characteristics of probability distributions. However, existing models may fail to accurately capture complex, multilateral dependencies between attributes.…
We present the data model, design choices, and performance of ProvSQL, a general and easy-to-deploy provenance tracking and probabilistic database system implemented as a PostgreSQL extension. ProvSQL's data and query models closely reflect…
We propose a novel framework to enable Knowledge Graphs (KGs) sharing while ensuring that information that should remain private is not directly released nor indirectly exposed via derived knowledge, maintaining at the same time the…
Query evaluation over probabilistic databases is notoriously intractable -- not only in combined complexity, but often in data complexity as well. This motivates the study of approximation algorithms, and particularly of combined FPRASes,…
For decades, SQL has been the default language for composing queries, but it is increasingly used as an artifact to be read and verified rather than authored. With Large Language Models (LLMs), queries are increasingly machine-generated,…
Cardinality estimation (CE), the task of predicting the result size of queries is a critical component of query optimization. Accurate estimates are essential for generating efficient query execution plans. Recently, machine learning…
Relational Database Management Systems (RDBMS) manage complex, interrelated data and support a broad spectrum of analytical tasks. With the growing demand for predictive analytics, the deep integration of machine learning (ML) into RDBMS…
Diverse types of edge data, such as 2D geo-locations and 3D point clouds, are collected by sensors like lidar and GPS receivers on edge devices. On-device searches, such as k-nearest neighbor (kNN) search and radius search, are commonly…
Increasing demand for Large Language Models (LLMs) services imposes substantial deployment and computation costs on providers. LLM routing offers a cost-efficient solution by directing queries to the optimal LLM based on model and query…
The ISO standard Property Graph model has become increasingly popular for representing complex, interconnected data. However, it lacks native support for querying metadata and reification, which limits its abilities to deal with the demands…
Relational databases (RDBs) are widely used by corporations and governments to store multiple related tables. Their relational schemas pose unique challenges to synthetic data generation for privacy-preserving data sharing, e.g., for…
We introduce an automated method for structuring textual data into a model-agnostic schema, enabling alignment with any database model. It generates both a schema and its instance. Initially, textual data is represented as semantically…