数据库
Data warehousing enables performant access to high-quality data integrated from dynamic data sources. The medallion architecture, a standard for data warehousing, addresses these goals by organizing data into bronze, silver and gold layers,…
Despite the many advantages that ontology-based data access (OBDA) has brought to a range of application domains, state-of-the-art OBDA systems still do not support popular graph database management systems such as Neo4j. Algorithms for…
Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important foundation for optimizing…
Models play a critical role in managing the vast amounts of data and increasing complexity found in the IoT, IIoT, and IoP domains. The Digital Shadow Reference Model, which serves as a foundational metadata schema for linking data and…
The use of narratives as a means of fusing information from knowledge graphs (KGs) into a coherent line of argumentation has been the subject of recent investigation. Narratives are especially useful in event-centric knowledge graphs in…
SPARQL query editors often lack intuitive interfaces to aid SPARQL-savvy users to write queries. To address this issue, we propose an easy-to-deploy, triple store-agnostic and open-source query editor that offers three main features: (i)…
Traditional data storage formats and databases often introduce complexities and inefficiencies that hinder rapid iteration and adaptability. To address these challenges, we introduce ParquetDB, a Python-based database framework that…
The growing interest in artificial intelligence has created workloads that require both sequential and random access. At the same time, NVMe-backed storage solutions have emerged, providing caching capability for large columnar datasets in…
Data quality (DQ) remains a fundamental concern in big data pipelines, especially when aggregations occur at multiple hierarchical levels. Traditional DQ validation rules often fail to scale or generalize across dimensions such as user…
The Deuteronomy transactional key-value store is unique architecturally in providing separation between transaction functionality -- its Transactional Component (TC) and data management -- its Data Component (DC). It is unique in technology…
Object-Centric Process Mining (OCPM) enables business process analysis from multiple perspectives. For example, an educational path can be examined from the viewpoints of students, teachers, and groups. This analysis depends on…
Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore…
Many objects are represented as high-dimensional vectors nowadays. In this setting, the relevance between two objects (vectors) is usually evaluated by their inner product. Recently, item-centric searches, which search for users relevant to…
The $k$-MIPS ($k$ Maximum Inner Product Search) problem has been employed in many fields. Recently, its reverse version, the reverse $k$-MIPS problem, has been proposed. Given an item vector (i.e., query), it retrieves all user vectors such…
Without any doubt, the relational paradigm has been a huge success. At the same time, we believe that the time is ripe to rethink how database systems could look like if we designed them from scratch. Would we really end up with the same…
Business Intelligence (BI) plays a critical role in empowering modern enterprises to make informed data-driven decisions, and has grown into a billion-dollar business. Self-service BI tools like Power BI and Tableau have democratized the…
The thesis proposes the problem of representation heterogeneity to emphasize the fact that heterogeneity is an intrinsic property of any representation, wherein, different observers encode different representations of the same target…
Provenance information are essential for the traceability of scientific studies or experiments and thus crucial for ensuring the credibility and reproducibility of research findings. This paper discusses a comprehensive provenance framework…
On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community…
Data-centric ML pipelines extend traditional machine learning (ML) pipelines -- of feature transformations and ML model training -- by outer loops for data cleaning, augmentation, and feature engineering to create high-quality input data.…