数据库
Streaming data collection is indispensable for stream data analysis, such as event monitoring. However, publishing these data directly leads to privacy leaks. $w$-event privacy is a valuable tool to protect individual privacy within a given…
Microservices architectures have become the foundation for developing scalable and modern software systems, but they also bring significant challenges in managing heterogeneous and distributed data. The pragmatic solution is polyglot…
In the context of the model-driven development of data-centric applications, OCL constraints play a major role in adding precision to the source models (e.g., data models and security models). Several code-generators have been proposed to…
For a given dataset $\mathcal{D}$ and structured label $f$, the goal of Filtered Approximate Nearest Neighbor Search (FANNS) algorithms is to find top-$k$ points closest to a query that satisfy label constraints, while ensuring both recall…
Traditional relational databases require users to manually specify join keys and assume exact matches between column names and values. In practice, this limits joinability across fragmented or inconsistently named tables. We propose a fuzzy…
Many data applications involve counting queries, where a client specifies a feasible range of variables and a database returns the corresponding item counts. A program that produces the counts of different queries often risks leaking…
This paper explores the evolving landscape of data spaces, focusing on key concepts, practical applications, and emerging future directions. It begins by introducing the foundational principles that underpin data space architectures,…
Research on learned cardinality estimation has made significant progress in recent years. However, existing methods still face distinct challenges that hinder their practical deployment in production environments. We define these challenges…
The database community lacks a unified relational query language for subset selection and optimisation queries, limiting both user expression and query optimiser reasoning about such problems. Decades of research (latterly under the rubric…
Database knob tuning is essential for optimizing the performance of modern database management systems, which often expose hundreds of knobs with continuous or categorical values. However, the large number of knobs and the vast…
Cultural heritage preservation faces significant challenges in managing diverse, multi-source, and multi-scale data for effective monitoring and conservation. This paper documents a comprehensive data historicity and migration framework…
Assessing data quality is crucial to knowing whether and how to use the data for different purposes. Specifically, given a collection of integrity constraints, various ways have been proposed to quantify the inconsistency of a database.…
Resistance distance computation is a fundamental problem in graph analysis, yet existing random walk-based methods are limited to approximate solutions and suffer from poor efficiency on small-treewidth graphs (e.g., road networks). In…
Efficiently computing group aggregations (i.e., GROUP BY) on modern architectures is critical for analytic database systems. Hash-based approaches in today's engines predominantly use a partitioned approach, in which incoming data is…
Efficient evaluation of regular expressions (regex, for short) is crucial for text analysis, and n-gram indexes are fundamental to achieving fast regex evaluation performance. However, these indexes face scalability challenges because of…
We propose KG-ER, a conceptual schema language for knowledge graphs that describes the structure of knowledge graphs independently of their representation (relational databases, property graphs, RDF) while helping to capture the semantics…
GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain…
Database query languages such as SQL for relational databases and Cypher for graph databases have been widely adopted. Recent advancements in large language models (LLMs) enable natural language interactions with databases through models…
Knowledge graphs represent complex data using nodes, relationships, and properties. Cypher, a powerful query language for graph databases, enables efficient modeling and querying. Recent advancements in large language models allow…
Large language models (LLMs) inference relies heavily on KV-caches to accelerate autoregressive decoding, but the resulting memory footprint grows rapidly with sequence length, posing significant efficiency challenges. Current KV-cache…