数据库 — Scifaro

DIRT: Database-Integrated Random Testing

Database management systems (DBMSs) are notoriously complex, making them difficult to test effectively, especially during early development when many features are incomplete. Traditional testing tools like SQLancer and SQLSmith are highly…

数据库 · 计算机科学 2026-04-21 Alperen Keles , Ethan Chou , Harrison Goldstein , Leonidas Lampropoulos

Efficient Distributed Exact Subgraph Matching via GNN-PE: Load Balancing, Cache Optimization, and Query Plan Ranking

Exact subgraph matching on large-scale graphs remains a challenging problem due to high computational complexity and distributed system constraints. Existing GNN-based path embedding (GNN-PE) frameworks achieve efficient exact matching on…

数据库 · 计算机科学 2026-04-21 Yu Wang , Hui Wang , Jiake Ge , Xin Wang

Revealing Inherent Concurrency in Event Data: A Partial Order Approach to Process Discovery

Process discovery algorithms traditionally linearize events, failing to capture the inherent concurrency of real-world processes. While some techniques can handle partially ordered data, they often struggle with scalability on large event…

数据库 · 计算机科学 2026-04-21 Humam Kourani , Gyunam Park , Wil M. P. van der Aalst

Compliance in Databases: A Study of Structural Policies and Query Optimization

Growing privacy regulations and internal governance mandates are driving demand for fine-grained, context-sensitive access control in data management systems. Among competing approaches, content-based access control -- where access…

数据库 · 计算机科学 2026-04-20 Ahana Pradhan , Srinivas Karthik , Imtiyazuddin Shaik , Srinivas Vivek

Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows

Agentic visual analytics (VA) represents an emerging class of systems in which large language model (LLM)-driven agents autonomously plan, execute, evaluate, and iterate across the full visual analytics pipeline. By shifting users from…

数据库 · 计算机科学 2026-04-20 Tianqi Luo , Leixian Shen , Yuyu Luo

EvoRAG: Making Knowledge Graph-based RAG Automatically Evolve through Feedback-driven Backpropagation

Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) has emerged as a promising paradigm for enhancing LLM reasoning by retrieving multi-hop paths from KGs. However, existing KG-RAG frameworks often underperform in real-world…

数据库 · 计算机科学 2026-04-20 Zhenbo Fu , Yuanzhe Zhang , Qiange Wang , Hao Yuan , Yuehao Xu , Enze Yi , Yanfeng Zhang , Ge Yu

DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency

While Large Language Models (LLMs) demonstrate impressive proficiency in generating SQL queries, they fundamentally lack the capability to self-evaluate correctness without an execution oracle. This limitation creates a stark…

数据库 · 计算机科学 2026-04-20 Boyan Li , Ou Ocean Kun Hei , Yue Yu , Yuyu Luo

KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

Log anomaly detection is crucial for uncovering system failures and security risks. Although logs originate from nested component executions with clear boundaries, this structure is lost when stored as flat sequences. As a result,…

数据库 · 计算机科学 2026-04-20 Lei Ma , Jinyang Liu , Tieying Zhang , Peter M. VanNostrand , Dennis M. Hofmann , Lei Cao , Elke A. Rundensteiner , Jianjun Chen

Dynamic read & write optimization with TurtleKV

High read and write performance is important for generic key-value stores, which are foundational to modern applications and databases. Yet, achieving high performance for mixed and dynamic workloads is challenging due to fundamental…

数据库 · 计算机科学 2026-04-20 Tony Astolfi , Vidya Silai , Darby Huye , Lan Liu , Raja R. Sambasivan , Johes Bater

Data Engineering Patterns for Cross-System Reconciliation in Regulated Enterprises: Architecture, Anomaly Detection, and Governance

Regulated enterprises in the United States -- banks, telecommunications providers, large technology companies -- operate across heterogeneous systems that were rarely designed to interoperate. ERP platforms, billing engines, supply chain…

数据库 · 计算机科学 2026-04-17 Zhijun Qiu

Efficient Community Search on Attributed Public-Private Graphs

Public-private graph, where a public network is visible to everyone and every user is also associated with its own small private graph accessed by itself only, widely exists in real-world applications of social networks and financial…

数据库 · 计算机科学 2026-04-17 Yuqi Chen , Weihan Zhang , Xin Huang

RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems

Recent advances in query optimization have shifted from traditional rule-based and cost-based techniques towards machine learning-driven approaches. Among these, reinforcement learning (RL) has attracted significant attention due to its…

数据库 · 计算机科学 2026-04-17 Seokwon Lee , Jaeyoung Sim , Sihyun Kim , Yuhsing Li , Yiwen Zhu , Kwanghyun Park

Parallel R-tree-based Spatial Query Processing on a Commercial Processing-in-Memory System

The growing volume of data in scientific domains has made spatial query processing increasingly challenging due to high data transfer costs across the memory hierarchy and limited memory bandwidth. To address these bottlenecks and reduce…

数据库 · 计算机科学 2026-04-17 Tasmia Jannat , Michael Gowanlock , Satish Puri

Detecting Dynamic Relationships in Object-Centric Event Logs

Object-centric process mining examines how processes interact with multiple co-evolving objects, and has gained great interest in recent years. However, object-centric event logs (OCELs) leave object relationships underspecified in several…

数据库 · 计算机科学 2026-04-16 Alessandro Gianola , Zeeshan Hameed , Marco Montali , Anjo Seidel , Mathias Weske , Sarah Winkler

Exploring Urban Land Use Patterns by Pattern Mining and Unsupervised Learning

Urban areas are intricate systems shaped by socioeconomic, environmental, and infrastructural factors, with land use patterns serving as aspects of urban morphology. This paper proposes a novel methodology leveraging frequent item set…

数据库 · 计算机科学 2026-04-16 Zdena Dobesova , Tai Dinh , Pavel Novak

From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

Modern cloud-native platforms expose thousands of time series metrics through systems like Prometheus, yet formulating correct queries in domain-specific languages such as PromQL remains a significant barrier for platform engineers and site…

数据库 · 计算机科学 2026-04-16 Twinkll Sisodia

A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection

Data-driven systems depend on task-relevant data, yet data collection pipelines remain passive and indiscriminate. Continuous logging of multimodal sensor streams incurs high storage costs and captures irrelevant data. This paper proposes a…

数据库 · 计算机科学 2026-04-16 Philipp Reis , Philipp Rigoll , Martin Zehetner , Jacqueline Henle , Stefan Otten , Eric Sax

Draft-Refine-Optimize: Self-Evolved Learning for Natural Language to MongoDB Query Generation

Natural Language to MongoDB Query Language (NL2MQL) is essential for democratizing access to modern document-centric databases. Unlike Text-to-SQL, NL2MQL faces unique challenges from MQL's procedural aggregation pipelines, deeply nested…

数据库 · 计算机科学 2026-04-16 Mingwei Ye , Jiaxi Zhuang , Mingjun Xu , Linfeng Zhang , Guolin Ke , Hengxing Cai

A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project

Semantic data harmonisation is a central requirement in the ILIAD project, where heterogeneous environmental data must be harmonised according to the Ocean Information Model (OIM), a modular family of ontologies for enabling the…

数据库 · 计算机科学 2026-04-16 Erik Johan Nystad , Francisco Martín-Recuerda

TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous

Table Structure Recognition (TSR) requires the logical reasoning ability of large language models (LLMs) to handle complex table layouts, but current datasets are limited in scale and quality, hindering effective use of this reasoning…

数据库 · 计算机科学 2026-04-16 Ruilin Zhang , Kai Yang