数据库
Transactional isolation guarantees are crucial for database correctness. However, recent studies have uncovered numerous isolation bugs in production databases. The common black-box approach to isolation checking stresses databases with…
Data discovery from data lakes is an essential application in modern data science. While many previous studies focused on improving the efficiency and effectiveness of data discovery, little attention has been paid to the usability of such…
Querying and exploring massive collections of data sources, such as data lakes, has been an essential research topic in the database community. Although many efforts have been paid in the field of data discovery and data integration in data…
The text-to-SQL problem aims to translate natural language questions into SQL statements to ease the interaction between database systems and end users. Recently, Large Language Models (LLMs) have exhibited impressive capabilities in a…
B$^+$-trees are prevalent in traditional database systems due to their versatility and balanced structure. While binary search is typically utilized for branch operations, it may lead to inefficient cache utilization in main-memory…
We embark on a study of the consistent answers of queries over databases annotated with values from a naturally ordered positive semiring. In this setting, the consistent answers of a query are defined as the minimum of the semiring values…
Elegance of a database API matters. Frequently, database APIs suit the database designer, rather than the programmer's desire for elegance and efficiency. This article pursues both: firstly, by comparing the Lua APIs for two separate…
The digital transformation of our society is a constant challenge, as data is generated in almost every digital interaction. To use data effectively, it must be of high quality. This raises the question: what exactly is data quality? A…
Serializability (SER) and snapshot isolation (SI) are widely used transactional isolation levels in database systems. The isolation checking problem asks whether a given execution history of a database system satisfies a specified isolation…
TrueTime clocks (TTCs) that offer accurate and reliable time within limited uncertainty bounds have been increasingly implemented in many clouds. Multi-region data stores that seek decentralized synchronization for high performance…
The proliferation of location-based services has led to massive spatial data generation. Spatial join is a crucial database operation that identifies pairs of objects from two spatial datasets based on spatial relationships. Due to the…
Time series data captures properties that change over time. Such data occurs widely, ranging from the scientific and medical domains to the industrial and environmental domains. When the properties in time series exhibit spatial variations,…
Data spaces are evolving rapidly. In Europe, the concept of data spaces, which emphasises the importance of trust, sovereignty, and interoperability, is being implemented as a platform such as Catena-X. Meanwhile, Japan has been developing…
Decision making under uncertainty often requires choosing packages, or bags of tuples, that collectively optimize expected outcomes while limiting risks. Processing Stochastic Package Queries (SPQs) involves solving very large optimization…
Analyzing unstructured data has been a persistent challenge in data processing. Large Language Models (LLMs) have shown promise in this regard, leading to recent proposals for declarative frameworks for LLM-powered processing of…
Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to…
As data volumes continue to grow rapidly, traditional search algorithms, like the red-black tree and B+ Tree, face increasing challenges in performance, especially in big data scenarios with intensive storage access. This paper presents the…
Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations…
This work studies Complex Event Recognition (CER) under time constraints regarding its query language, computational models, and streaming evaluation algorithms. We start by introducing an extension of Complex Event Logic (CEL), called…
We study the minimization problem for Conjunctive Regular Path Queries (CRPQs) and unions of CRPQs (UCRPQs). This is the problem of checking, given a query and a number $k$, whether the query is equivalent to one of size at most $k$. For…