数据库
Database management systems (DBMSs) are notoriously complex, making them difficult to test effectively, especially during early development when many features are incomplete. Traditional testing tools like SQLancer and SQLSmith are highly…
Exact subgraph matching on large-scale graphs remains a challenging problem due to high computational complexity and distributed system constraints. Existing GNN-based path embedding (GNN-PE) frameworks achieve efficient exact matching on…
Process discovery algorithms traditionally linearize events, failing to capture the inherent concurrency of real-world processes. While some techniques can handle partially ordered data, they often struggle with scalability on large event…
Growing privacy regulations and internal governance mandates are driving demand for fine-grained, context-sensitive access control in data management systems. Among competing approaches, content-based access control -- where access…
Agentic visual analytics (VA) represents an emerging class of systems in which large language model (LLM)-driven agents autonomously plan, execute, evaluate, and iterate across the full visual analytics pipeline. By shifting users from…
Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) has emerged as a promising paradigm for enhancing LLM reasoning by retrieving multi-hop paths from KGs. However, existing KG-RAG frameworks often underperform in real-world…
While Large Language Models (LLMs) demonstrate impressive proficiency in generating SQL queries, they fundamentally lack the capability to self-evaluate correctness without an execution oracle. This limitation creates a stark…
Log anomaly detection is crucial for uncovering system failures and security risks. Although logs originate from nested component executions with clear boundaries, this structure is lost when stored as flat sequences. As a result,…
High read and write performance is important for generic key-value stores, which are foundational to modern applications and databases. Yet, achieving high performance for mixed and dynamic workloads is challenging due to fundamental…
Regulated enterprises in the United States -- banks, telecommunications providers, large technology companies -- operate across heterogeneous systems that were rarely designed to interoperate. ERP platforms, billing engines, supply chain…
Public-private graph, where a public network is visible to everyone and every user is also associated with its own small private graph accessed by itself only, widely exists in real-world applications of social networks and financial…
Recent advances in query optimization have shifted from traditional rule-based and cost-based techniques towards machine learning-driven approaches. Among these, reinforcement learning (RL) has attracted significant attention due to its…
The growing volume of data in scientific domains has made spatial query processing increasingly challenging due to high data transfer costs across the memory hierarchy and limited memory bandwidth. To address these bottlenecks and reduce…
Object-centric process mining examines how processes interact with multiple co-evolving objects, and has gained great interest in recent years. However, object-centric event logs (OCELs) leave object relationships underspecified in several…
Urban areas are intricate systems shaped by socioeconomic, environmental, and infrastructural factors, with land use patterns serving as aspects of urban morphology. This paper proposes a novel methodology leveraging frequent item set…
Modern cloud-native platforms expose thousands of time series metrics through systems like Prometheus, yet formulating correct queries in domain-specific languages such as PromQL remains a significant barrier for platform engineers and site…
Data-driven systems depend on task-relevant data, yet data collection pipelines remain passive and indiscriminate. Continuous logging of multimodal sensor streams incurs high storage costs and captures irrelevant data. This paper proposes a…
Natural Language to MongoDB Query Language (NL2MQL) is essential for democratizing access to modern document-centric databases. Unlike Text-to-SQL, NL2MQL faces unique challenges from MQL's procedural aggregation pipelines, deeply nested…
Semantic data harmonisation is a central requirement in the ILIAD project, where heterogeneous environmental data must be harmonised according to the Ocean Information Model (OIM), a modular family of ontologies for enabling the…
Table Structure Recognition (TSR) requires the logical reasoning ability of large language models (LLMs) to handle complex table layouts, but current datasets are limited in scale and quality, hindering effective use of this reasoning…