数据库
Multi-cloud computing systems face significant challenges in ensuring acceptable performance while adhering to tenant budget requirements. This paper proposes a tenant budget-aware (tenant-centric) data replication framework for Multi-Cloud…
We initiate an investigation how the fundamental concept of independence can be represented effectively in the presence of incomplete information in relational databases. The concepts of possible and certain independence are proposed, and…
Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database…
In many real-world applications such as social network analysis, knowledge graph discovery, biological network analytics, and so on, graph data management has become increasingly important and has drawn much attention from the database…
SQL/PGQ is the emerging ISO standard for querying property graphs defined as views over relational data. We formalize its expressive power across three fragments: the read-only core, the read-write extension, and an extended variant with…
Business process management is increasingly practiced using data-driven approaches. Still, classical imperative process models, which are typically formalized using Petri nets, are not straightforwardly applicable to the relational…
Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high…
In this paper, we discuss a novel technique for processing correlated subqueries in SQL. The core idea is to isolate the non-correlated part of the predicate and use it to reduce the number of evaluations of the correlated part. We begin by…
Accurate query runtime prediction is a critical component of effective query optimization in modern database systems. Traditional cost models, such as those used in PostgreSQL, rely on static heuristics that often fail to reflect actual…
In a large database system, upper-bounding the cardinality of a join query is a crucial task called $\textit{pessimistic cardinality estimation}$. Recently, Abo Khamis, Nakos, Olteanu, and Suciu unified related works into the following…
DB engines produce efficient query execution plans by relying on cost models. Practical implementations estimate cardinality of queries using heuristics, with magic numbers tuned to improve average performance on benchmarks. Empirically,…
In modern databases, the practice of data normalization continues to be important in improving data integrity, minimizing redundancies, and eliminating anomalies. However, since its inception and consequent improvements, there have been no…
The tensor programming abstraction is a foundational paradigm which allows users to write high performance programs via a high-level imperative interface. Recent work on sparse tensor compilers has extended this paradigm to sparse tensors…
Machine learning models for clinical prediction rely on structured data extracted from Electronic Medical Records (EMRs), yet this process remains dominated by hardcoded, database-specific pipelines for cohort definition, feature selection,…
Large language models (LLMs) have shown strong performance in natural language to SQL (NL2SQL) tasks within general databases. However, extending to GeoSQL introduces additional complexity from spatial data types, function invocation, and…
This chapter presents a comprehensive taxonomy for assessing data quality in the context of data monetisation, developed through a systematic literature review. Organising over one hundred metrics and Key Performance Indicators (KPIs) into…
The goal of this study is to estimate the amount of lost data in electron microscopy and to analyze the extent to which experimentally acquired images are utilized in peer-reviewed scientific publications. Analysis of the number of images…
The FAIR Principles aim to make data and knowledge Findable, Accessible, Interoperable, and Reusable, yet current digital infrastructures often lack a unifying semantic framework that bridges human cognition and machine-actionability. In…
This paper introduces Experiversum, a lakehouse-based ecosystem that supports the curation, documentation and reproducibility of exploratory experiments. Experiversum enables structured research through iterative data cycles, while…
Error detection in relational databases is critical for maintaining data quality and is fundamental to tasks such as data cleaning and assessment. Current error detection studies mostly employ the multi-detector approach to handle…