数据库
Motivated by the challenges of implementing policy-based data access control (PBAC) under multiple simultaneously applicable compliance frameworks, we present Parajudica, an open, modular, and extensible RDF/SPARQL-based rule system for…
Temporal graphs are graphs whose nodes and edges, together with their associated properties, continuously change over time. With the development of Internet of Things (IoT) systems, a subclass of the temporal graph, i.e., Property Evolution…
Large Language Models (LLMs) are being increasingly used within data systems to process large datasets with text fields. A broad class of such tasks involves a semantic join-joining two tables based on a natural language predicate per pair…
In this short paper, we explore the enrichment of event logs with data from wearable devices. We discuss three approaches: (1) treating wearable data as event attributes, linking them directly to individual events, (2) treating wearable…
Vector search has been widely employed in recommender system and retrieval-augmented-generation pipelines, commonly performed with vector indexes to efficiently find similar items in large datasets. Recent growths in both data and task…
SPARQL query rewriting is a fundamental mechanism for uniformly querying heterogeneous ontologies in the Linked Data Web. However, the complexity of ontology alignments, particularly rich correspondences (c : c), makes this process…
Natural Language to SQL (i.e., NL2SQL) translation is crucial for democratizing database access, but even state-of-the-art models frequently generate semantically incorrect SQL queries, hindering the widespread adoption of these techniques…
The resilience problem for a query and an input set or bag database is to compute the minimum number of facts to remove from the database to make the query false. In this paper, we study how to compute the resilience of Regular Path Queries…
Translating users' natural language queries (NL) into SQL queries (i.e., Text-to-SQL, a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of…
The State Database of a blockchain stores account data and enables authentication. Modern blockchains use fast consensus protocols to avoid forking, improving throughput and finality. However, Ethereum's StateDB was designed for a forking…
Data-sharing pipelines involve a series of stages that apply policy-based data transformations to enable secure and effective data exchange among organizations. Although numerous tools and platforms exist to manage governance and…
The academic evolution of process mining is moving toward object centric process mining, marking a significant shift in how processes are modeled and analyzed. IBM has developed its own distinctive approach called Multilevel Process Mining.…
Object-centric process mining requires structured data, but extracting it from unstructured text remains a challenge. We introduce ExOAR (Expert-Guided Object and Activity Recognition), an interactive method that combines large language…
Organizations struggle to share data across departments that have adopted different data analytics platforms. If n datasets must serve m environments, up to n*m replicas can emerge, increasing inconsistency and cost. Traditional warehouses…
Monitoring unstructured streams increasingly requires persistent, semantics-aware computation, yet today's LLM frameworks remain stateless and one-shot, limiting their usefulness for long-running analytics. We introduce Continuous Prompts…
Query rewriting is an effective technique for refining poorly written queries before they reach the query optimizer. However, manual rewriting is not scalable, as it is prone to errors and requires deep expertise. Traditional query…
The growing use of longitudinal university administrative records in data-driven decision-making often overlooks a critical layer: how raw, inconsistent data are normalised before modelling. This article presents a three-stage normalisation…
Social science research increasingly demands data-driven insights, yet researchers often face barriers such as lack of technical expertise, inconsistent data formats, and limited access to reliable datasets.Social science research…
Discovering which tables in large, heterogeneous repositories can be joined and by what transformations is a central challenge in data integration and data discovery. Traditional join discovery methods are largely designed for equi-joins,…
Prefill and decode (PD) disaggregation separates prompt prefill and token-by-token decode stages into distinct GPU pools and has become the dominant architecture for large-scale LLM serving in industry. Also, retrieval tasks via vector…