数据库
Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks. In particular, improvements in reasoning abilities and the expansion of context windows have opened new avenues for…
LSM-tree-based data stores are widely used in industry due to their exceptional performance. However, as data volumes grow, efficiently querying large-scale databases becomes increasingly challenging. To address this, recent studies…
In the wake of the recent resurgence of the Datalog language of databases, together with its extensions for ontological reasoning settings, this work aims to bridge the gap between the theoretical studies of DatalogMTL (Datalog extended…
Modern data-intensive applications increasingly store and process big-value items, such as multimedia objects and machine learning embeddings, which exacerbate storage inefficiencies in Log-Structured Merge-Tree (LSM)-based key-value…
Fatigue-induced crack growth is a leading cause of structural failure across critical industries such as aerospace, civil engineering, automotive, and energy. Accurate prediction of stress intensity factors (SIFs) -- the key parameters…
Integrated and efficient mobility requires data sharing among the involved stakeholders. In this direction, regulators and transport authorities have been defining policies to foster the digitalisation and online publication of mobility…
Equivalence checking of SQL queries is an intractable problem often encountered in settings ranging from grading SQL submissions to debugging query optimizers. Despite recent work toward developing practical solutions, only simple queries…
Exponential growth in data collection is creating significant challenges for data storage and analytics latency.Approximate Query Processing (AQP) has long been touted as a solution for accelerating analytics on large datasets, however,…
Data quality is fundamental to modern data science workflows, where data continuously flows as unbounded streams feeding critical downstream tasks, from elementary analytics to advanced artificial intelligence models. Existing data quality…
The Pandas API has been central to the success of pandas and its alternatives. Despite its importance, there is no benchmark for it, and we argue that we cannot repurpose existing benchmarks (from other domains) for the Pandas API. In this…
Semantic knowledge graphs are foundational to implementing the FAIR Principles, yet RDF/OWL representations often lack the semantic flexibility and cognitive interoperability required in scientific domains. We present a novel framework for…
Path queries are a core feature of modern graph query languages such as Cypher, SQL/PGQ, and GQL. These languages provide a rich set of features for matching paths, such as restricting to certain path modes (shortest, simple, trail) and…
Advances in storage technology have introduced Non-Volatile Memory, NVM, as a new storage medium. NVM, along with Dynamic Random Access Memory (DRAM), Solid State Disk (SSD), and Disk present a system designer with a wide array of options…
This paper draws attention to the potential of computational methods in reworking data generated in past qualitative studies. While qualitative inquiries often produce rich data through rigorous and resource-intensive processes, much of…
In recent work, we have shown that NVIDIA's raytracing cores on RTX video cards can be exploited to realize hardware-accelerated lookups for GPU-resident database indexes. On a high level, the concept materializes all keys as triangles in a…
We present TransClean, a method for detecting false positive predictions of entity matching algorithms under real-world conditions characterized by large-scale, noisy, and unlabeled multi-source datasets that undergo distributional shifts.…
Spatial join processing techniques that identify intersections between complex geometries (e.g., polygons) commonly follow a two-step filter-and-refine pipeline. The filter step evaluates the query predicate on the minimum bounding…
Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine…
Entity Resolution (ER) is a fundamental data quality improvement task that identifies and links records referring to the same real-world entity. Traditional ER approaches often rely on pairwise comparisons, which can be costly in terms of…
Transforming relational databases into knowledge graphs with enriched ontologies enhances semantic interoperability and unlocks advanced graph-based learning and reasoning over data. However, previous approaches either demand significant…