数据库
Skiplists have become prevalent in systems. The main advantages of skiplists are their simplicity and ease of implementation, and the ability to support operations in the same asymptotic complexities as their tree-based counterparts. In…
Data stream monitoring is a crucial task which has a wide range of applications. The majority of existing research in this area can be broadly classified into two types, monitoring value sum and monitoring value cardinality. In this paper,…
Due to the variety of its target use cases and the large API surface area to cover, a data lakehouse (DLH) is a natural candidate for a composable data system. Bauplan is a composable DLH built on "spare data parts" and a unified…
Interdisciplinary collaboration in battery science is required for rapid evaluation of better compositions and materials. However, diverging domain vocabulary and non-compatible experimental results slow down cooperation. We critically…
We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during…
Modern data stream applications demand memory-efficient solutions for accurately tracking frequent items, such as heavy hitters and heavy changers, under strict resource constraints. Traditional sketches face inherent accuracy-memory…
Efficient vector query processing is critical to enable AI applications at scale. Recent solutions struggle with growing vector datasets that exceed single-machine memory capacity, forcing unnecessary data movement and resource…
As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional…
Machine-learning from a disparate set of tables, a data lake, requires assembling features by merging and aggregating tables. Data discovery can extend autoML to data tables by automating these steps. We present an in-depth analysis of such…
The variety of data in data lakes presents significant challenges for data analytics, as data scientists must simultaneously analyze multi-modal data, including structured, semi-structured, and unstructured data. While Large Language Models…
Timeseries monitoring systems such as Prometheus play a crucial role in gaining observability of the underlying system components. These systems collect timeseries metrics from various system components and perform monitoring queries over…
We present a novel linear-time acyclic join algorithm, TreeTracker Join (TTJ). The algorithm can be understood as the pipelined binary hash join with a simple twist: upon a hash lookup failure, TTJ resets execution to the binding of the…
In this paper, we investigate the problem of evaluating Basic Graph Patterns (BGP, for short, a subclass of SPARQL queries) over dynamic Linked Data graphs; i.e., Linked Data graphs that are continuously updated. We consider a setting where…
Public procurement plays a critical role in government operations, ensuring the efficient allocation of resources and fostering economic growth. However, traditional procurement data is often stored in rigid, tabular formats, limiting its…
The FAIR principles are globally accepted guidelines for improved data management practices with the potential to align data spaces on a global scale. In practice, this is only marginally achieved through the different ways in which…
A Relational Database Management System (RDBMS) is one of the fundamental software that supports a wide range of applications, making it critical to identify bugs within these systems. There has been active research on testing RDBMS, most…
Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example,…
Cardinality estimation is a fundamental component in database systems, crucial for generating efficient execution plans. Despite advancements in learning-based cardinality estimation, existing methods may struggle to simultaneously optimize…
Multi-tenant architectures enhance the elasticity and resource utilization of NoSQL databases by allowing multiple tenants to co-locate and share resources. However, in large-scale cloud environments, the diverse and dynamic nature of…
Vector databases have emerged as a new type of systems that support efficient querying of high-dimensional vectors. Many of these offer their database as a service in the cloud. However, the variety of available CPUs and the lack of vector…