数据库
Knowledge Graphs (KGs) are increasingly adopted as a foundational technology for integrating heterogeneous data in domains such as climate science, cultural heritage, and the life sciences. Declarative mapping languages like R2RML and RML…
Provenance-based data skipping compactly over-approximates the provenance of a query using so-called provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a…
Join processing is a fundamental operation in database management systems; however, traditional join algorithms often encounter efficiency challenges when dealing with complex queries that produce intermediate results much larger than the…
Minimizing data-to-analysis time while enabling real-time interaction and efficient analytical computations on large datasets are fundamental objectives of contemporary exploratory systems. Although some of the recent adaptive indexing and…
The global issue of overcrowding in emergency departments (ED) necessitates the analysis of patient flow through ED to enhance efficiency and alleviate overcrowding. However, traditional analytical methods are time-consuming and costly. The…
NL2SQL (natural language to SQL) systems translate natural language into SQL queries, allowing users with no technical background to interact with databases and create tools like reports or visualizations. While recent advancements in large…
The integration of heterogeneous databases into a unified querying framework remains a critical challenge, particularly in resource-constrained environments. This paper presents a novel Small Language Model(SLM)-driven system that…
Modern hardware architectures, e.g., NUMA servers, chiplet processors, tiered and disaggregated memory systems have significantly improved the performance of Main-Memory Databases, and are poised to deliver further improvements in the…
Query optimizers in RDBMSs search for execution plans expected to be optimal for given queries. They use parameter estimates, often inaccurate, and make assumptions that may not hold in practice. Consequently, they may select plans that are…
Linked Data and labelled property graphs (LPG) are two data management approaches with complementary strengths and weaknesses, making their integration beneficial for sharing datasets and supporting software ecosystems. In this paper, we…
Effective evaluation of web data record extraction methods is crucial, yet hampered by static, domain-specific benchmarks and opaque scoring practices. This makes fair comparison between traditional algorithmic techniques, which rely on…
Acyclic join queries can be evaluated instance-optimally using Yannakakis' algorithm, which avoids needlessly large intermediate results through semi-join passes. Recent work proposes to address the significant hidden constant factors…
In recent years, trace generation has emerged as a significant challenge within the Process Mining community. Deep Learning (DL) models have demonstrated accuracy in reproducing the features of the selected processes. However, current DL…
Joins are among the most time-consuming and data-intensive operations in relational query processing. Much research effort has been applied to the optimization of join processing due to their frequent execution. Recent studies have shown…
The article addresses the problem of storing data in extreme environmental conditions with limited computing resources and memory. There is a requirement to create portable, fault-tolerant, modular database management systems (DBMS) that…
Machine learning has demonstrated transformative potential for database operations, such as query optimization and in-database data analytics. However, dynamic database environments, characterized by frequent updates and evolving data…
Text-to-SQL (Text2SQL) aims to map natural language questions to executable SQL queries. Although large language models (LLMs) have driven significant progress, current approaches struggle with poor transferability to open-source LLMs,…
Prescriptive Analytics (PSA), an emerging business analytics field suggesting concrete options for solving business problems, has seen an increasing amount of interest after more than a decade of multidisciplinary research. This paper is a…
In every enterprise database, administrators must define an access control policy that specifies which users have access to which tables. Access control straddles two worlds: policy (organization-level principles that define who should have…
Azure Cosmos DB is a cloud-native distributed database, operating at a massive scale, powering Microsoft Cloud. Think 10s of millions of database partitions (replica-sets), 100+ PBs of data under management, 20M+ vCores. Failovers are an…