数据库 — Scifaro

Hub Star Modeling 2.0 for Medallion Architecture

Data warehousing enables performant access to high-quality data integrated from dynamic data sources. The medallion architecture, a standard for data warehousing, addresses these goals by organizing data into bronze, silver and gold layers,…

数据库 · 计算机科学 2025-04-24 Shahram Salami

Towards Practicable Algorithms for Rewriting Graph Queries beyond DL-Lite

Despite the many advantages that ontology-based data access (OBDA) has brought to a range of application domains, state-of-the-art OBDA systems still do not support popular graph database management systems such as Neo4j. Algorithms for…

数据库 · 计算机科学 2025-04-24 Bianca Löhnert , Nikolaus Augsten , Cem Okulmus , Magdalena Ortiz

Proving Cypher Query Equivalence

Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important foundation for optimizing…

数据库 · 计算机科学 2025-04-23 Lei Tang , Wensheng Dou , Yingying Zheng , Lijie Xu , Wei Wang , Jun Wei , Tao Huang

Assessing FAIRness of the Digital Shadow Reference Model

Models play a critical role in managing the vast amounts of data and increasing complexity found in the IoT, IIoT, and IoP domains. The Digital Shadow Reference Model, which serves as a foundational metadata schema for linking data and…

数据库 · 计算机科学 2025-04-23 Johannes Theissen-Lipp

A Conceptual Model for Attributions in Event-Centric Knowledge Graphs

The use of narratives as a means of fusing information from knowledge graphs (KGs) into a coherent line of argumentation has been the subject of recent investigation. Narratives are especially useful in event-centric knowledge graphs in…

数据库 · 计算机科学 2025-04-23 Florian Plötzky , Katarina Britz , Wolf-Tilo Balke

A user-friendly SPARQL query editor powered by lightweight metadata

SPARQL query editors often lack intuitive interfaces to aid SPARQL-savvy users to write queries. To address this issue, we propose an easy-to-deploy, triple store-agnostic and open-source query editor that offers three main features: (i)…

数据库 · 计算机科学 2025-04-23 Vincent Emonet , Ana-Claudia Sima , Tarcisio Mendes de Farias

ParquetDB: A Lightweight Python Parquet-Based Database

Traditional data storage formats and databases often introduce complexities and inefficiencies that hinder rapid iteration and adaptability. To address these challenges, we introduce ParquetDB, a Python-based database framework that…

数据库 · 计算机科学 2025-04-23 Logan Lang , Eduardo Hernandez , Kamal Choudhary , Aldo H. Romero

Lance: Efficient Random Access in Columnar Storage through Adaptive Structural Encodings

The growing interest in artificial intelligence has created workloads that require both sequential and random access. At the same time, NVMe-backed storage solutions have emerged, providing caching capability for large columnar datasets in…

数据库 · 计算机科学 2025-04-22 Weston Pace , Chang She , Lei Xu , Will Jones , Albert Lockett , Jun Wang , Raunak Shah

Hierarchical Robust PCA for Scalable Data Quality Monitoring in Multi-level Aggregation Pipelines

Data quality (DQ) remains a fundamental concern in big data pipelines, especially when aggregations occur at multiple hierarchical levels. Traditional DQ validation rules often fail to scale or generalize across dimensions such as user…

数据库 · 计算机科学 2025-04-22 Preetam Kumar Ojha

Deuteronomy 2.0: Record Caching and Latch Freedom

The Deuteronomy transactional key-value store is unique architecturally in providing separation between transaction functionality -- its Transactional Component (TC) and data management -- its Data Component (DC). It is unique in technology…

数据库 · 计算机科学 2025-04-22 David Lomet

OCPM$^2$: Extending the Process Mining Methodology for Object-Centric Event Data Extraction

Object-Centric Process Mining (OCPM) enables business process analysis from multiple perspectives. For example, an educational path can be examined from the viewpoints of students, teachers, and groups. This analysis depends on…

数据库 · 计算机科学 2025-04-22 Najmeh Miri , Shahrzad Khayatbashi , Jelena Zdravkovic , Amin Jalali

Soft and Constrained Hypertree Width

Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore…

数据库 · 计算机科学 2025-04-22 Matthias Lanzinger , Cem Okulmus , Reinhard Pichler , Alexander Selzer , Georg Gottlob

Approximate Reverse $k$-Ranks Queries in High Dimensions

Many objects are represented as high-dimensional vectors nowadays. In this setting, the relevance between two objects (vectors) is usually evaluated by their inner product. Recently, item-centric searches, which search for users relevant to…

数据库 · 计算机科学 2025-04-21 Daichi Amagata , Kazuyoshi Aoyama , Keito Kido , Sumio Fujita

How to Mine Potentially Popular Items? A Reverse MIPS-based Approach

The $k$-MIPS ($k$ Maximum Inner Product Search) problem has been employed in many fields. Recently, its reverse version, the reverse $k$-MIPS problem, has been proposed. Given an item vector (i.e., query), it retrieves all user vectors such…

数据库 · 计算机科学 2025-04-21 Daichi Amagata , Kazuyoshi Aoayama , Keito Kido , Sumio Fujita

How to get Rid of SQL, Relational Algebra, the Relational Model, ERM, and ORMs in a Single Paper -- A Thought Experiment

Without any doubt, the relational paradigm has been a huge success. At the same time, we believe that the time is ripe to rethink how database systems could look like if we designed them from scratch. Would we really end up with the same…

数据库 · 计算机科学 2025-04-18 Jens Dittrich

Auto-Prep: Holistic Prediction of Data Preparation Steps for Self-Service Business Intelligence

Business Intelligence (BI) plays a critical role in empowering modern enterprises to make informed data-driven decisions, and has grown into a billion-dollar business. Self-service BI tools like Power BI and Tableau have democratized the…

数据库 · 计算机科学 2025-04-17 Eugenie Y. Lai , Yeye He , Surajit Chaudhuri

Language and Knowledge Representation: A Stratified Approach

The thesis proposes the problem of representation heterogeneity to emphasize the fact that heterogeneity is an intrinsic property of any representation, wherein, different observers encode different representations of the same target…

数据库 · 计算机科学 2025-04-17 Mayukh Bagchi

Towards dimensions and granularity in a unified workflow and data provenance framework

Provenance information are essential for the traceability of scientific studies or experiments and thus crucial for ensuring the credibility and reproducibility of research findings. This paper discusses a comprehensive provenance framework…

数据库 · 计算机科学 2025-04-16 Tanja Auge , Sascha Genehr , Meike Klettke and , Frank Krüger , Max Schröder

The Cambridge Report on Database Research

On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community…

数据库 · 计算机科学 2025-04-16 Anastasia Ailamaki , Samuel Madden , Daniel Abadi , Gustavo Alonso , Sihem Amer-Yahia , Magdalena Balazinska , Philip A. Bernstein , Peter Boncz , Michael Cafarella , Surajit Chaudhuri , Susan Davidson , David DeWitt , Yanlei Diao , Xin Luna Dong , Michael Franklin , Juliana Freire , Johannes Gehrke , Alon Halevy , Joseph M. Hellerstein , Mark D. Hill , Stratos Idreos , Yannis Ioannidis , Christoph Koch , Donald Kossmann , Tim Kraska , Arun Kumar , Guoliang Li , Volker Markl , Renée Miller , C. Mohan , Thomas Neumann , Beng Chin Ooi , Fatma Ozcan , Aditya Parameswaran , Ippokratis Pandis , Jignesh M. Patel , Andrew Pavlo , Danica Porobic , Viktor Sanca , Michael Stonebraker , Julia Stoyanovich , Dan Suciu , Wang-Chiew Tan , Shiv Venkataraman , Matei Zaharia , Stanley B. Zdonik

Morphing-based Compression for Data-centric ML Pipelines

Data-centric ML pipelines extend traditional machine learning (ML) pipelines -- of feature transformations and ML model training -- by outer loops for data cleaning, augmentation, and feature engineering to create high-quality input data.…

数据库 · 计算机科学 2025-04-16 Sebastian Baunsgaard , Matthias Boehm