数据库
Prescriptive Process Monitoring (PresPM) is an emerging area within Process Mining, focused on optimizing processes through real-time interventions for effective decision-making. PresPM holds significant promise for organizations seeking…
Dynamic graph storage systems are essential for real-time applications such as social networks and recommendation, where graph data continuously evolves. However, they face significant challenges in efficiently handling concurrent read and…
In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful…
This paper investigates the feasibility of achieving zero-knowledge verifiability for graph databases, enabling database owners to cryptographically prove the query execution correctness without disclosing the underlying data. Although…
Query optimizers are crucial for the performance of database systems. Recently, many learned query optimizers (LQOs) have demonstrated significant performance improvements over traditional optimizers. However, most of them operate under a…
Despite growing interest in process analysis and mining for data-aware specifications, alignment-based conformance checking for declarative process models has focused on pure control-flow specifications, or mild data-aware extensions…
In the financial industry, data is the lifeblood of operations, and DBAs shoulder significant responsibilities for SQL tuning, database deployment, diagnosis, and service repair. In recent years, both database vendors and customers have…
The presence of concept drift poses challenges for anomaly detection in time series. While anomalies are caused by undesirable changes in the data, differentiating abnormal changes from varying normal behaviours is difficult due to…
We propose a theory that can determine the lowest isolation level that can be allocated to each transaction program in an application in a mixed-isolation-level setting, to guarantee that all executions will be serializable and thus…
We survey graph reachability indexing techniques for efficient processing of graph reachability queries in two types of popular graph models: plain graphs and edge-labeled graphs. Reachability queries are Boolean in nature, determining…
The past few years has witnessed specialized large language model (LLM) inference systems, such as vLLM, SGLang, Mooncake, and DeepFlow, alongside rapid LLM adoption via services like ChatGPT. Driving these system design efforts is the…
While high data quality (DQ) is critical for analytics, compliance, and AI performance, data quality management (DQM) remains a complex, resource-intensive, and often manual process. This study investigates the extent to which existing…
This version has been removed by arXiv administrators, as we were obligated to do, due to claimed copyright infringement by a third party
Process mining on business process execution data has focused primarily on orchestration-type processes performed in a single organization (intra-organizational). Collaborative (inter-organizational) processes, unlike those of orchestration…
Educational Process Mining (EPM) is a data analysis technique that is used to improve educational processes. It is based on Process Mining (PM), which involves gathering records (logs) of events to discover process models and analyze the…
A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($\epsilon$-PLA) has emerged…
Huawei's cloud-native database system GaussDB for MySQL (also known as Taurus) stores data in a separate storage layer consisting of a pool of storage servers. Each server has considerable compute power making it possible to push data…
Recent advances in graph databases (GDBs) have been driving interest in large-scale analytics, yet current systems fail to support higher-order (HO) interactions beyond first-order (one-hop) relations, which are crucial for tasks such as…
The healthcare industry is moving towards a patient-centric paradigm that requires advanced methods for managing and representing patient data. This paper presents a Patient Journey Ontology (PJO), a framework that aims to capture the…
We introduce a new dataset and algorithm for fast and efficient coastal distance calculations from Anywhere on Earth (AoE). Existing global coastal datasets are only available at coarse resolution (e.g. 1-4 km) which limits their utility.…