数据库 — Scifaro

Hierarchical Decomposition of Separable Workflow-Nets

The Partially Ordered Workflow Language (POWL) has recently emerged as a process modeling notation, offering strong quality guarantees and high expressiveness. While early versions of POWL relied on strict block-structured operators for…

数据库 · 计算机科学 2026-04-27 Humam Kourani , Gyunam Park , Wil M. P. van der Aalst

TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models

The SQL-based exploratory data analysis has garnered significant attention within the data analysis community. The emergence of large language models (LLMs) has facilitated the paradigm shift from manual to automated data exploration.…

数据库 · 计算机科学 2026-04-27 Jun-Peng Zhu , Boyan Niu , Peng Cai , Zheming Ni , Kai Xu , Jiajun Huang , Shengbo Ma , Bing Wang , Xuan Zhou , Guanglei Bao , Donghui Zhang , Liu Tang , Qi Liu

HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads

Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict…

数据库 · 计算机科学 2026-04-27 Xinyi Zhang , Liang Liang , Anastasia Ailamaki , Jianliang Xu

An Alternate Agentic AI Architecture (It's About the Data)

For the last several years, the dominant narrative in "agentic AI" has been that large language models should orchestrate information access by dynamically selecting tools, issuing sub-queries, and synthesizing results. We argue this…

数据库 · 计算机科学 2026-04-24 Fabian Wenz , Felix Treutwein , Kai Arenja , Çagatay Demiralp , Michael Stonebraker

Scaling Worst-Case Optimal Datalog to GPUs

Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, marrying logic-defined specification with…

数据库 · 计算机科学 2026-04-24 Yihao Sun , Kunting Qi , Thomas Gilray , Sidharth Kumar , Kristopher Micinski

iPDB -- Optimizing Semantic SQL Queries

Structured Query Language (SQL) has remained the standard query language for databases. SQL is highly optimized for processing structured data laid out in relations. Meanwhile, in the present application development landscape, it is highly…

数据库 · 计算机科学 2026-04-24 Udesh Kumarasinghe , Tyler Liu , Ahmed R. Mahmood , Chunwei Liu , Walid G. Aref

Making TransactionIsolation Checking Practical

Checking whether database transactions adhere to isolation levels is a crucial yet challenging problem. We present Boomslang, the first general-purpose checking framework capable of verifying configurations that were previously uncheckable.…

数据库 · 计算机科学 2026-04-23 Jian Zhang , Shuai Mu , Cheng Tan

Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach

Cloud data warehouses bill compute based on slot-time consumed. In shared multi-tenant environments, query cost is highly variable and hard to estimate before execution, causing budget overruns and degraded scheduling. Static query-planner…

数据库 · 计算机科学 2026-04-23 Prashant Kumar Pathak

An Agentic Approach to Metadata Reasoning

As LLM-driven autonomous agents evolve to perform complex, multi-step tasks that require integrating multiple datasets, the problem of discovering relevant data sources becomes a key bottleneck. Beyond the challenge posed by the sheer…

数据库 · 计算机科学 2026-04-23 Jiani Zhang , Sercan O. Arik , Cosmin Arad , Fatma Ozcan , Alon Halevy

3DPipe: A Pipelined GPU Framework for Scalable Generalized Spatial Join over Polyhedral Objects

Spatial join is a fundamental operation in spatial databases. With the rapid growth of 3D data in applications such as LiDAR-based object detection and 3D digital pathology, there is an increasing need to support spatial join over 3D…

数据库 · 计算机科学 2026-04-23 Lyuheng Yuan , Da Yan , Akhlaque Ahmad , Fusheng Wang

Enabling Data Dependency-based Query Optimization

Primary key (PK) and foreign key (FK) constraints are widely used for query optimization. Knowledge about additional data dependencies, such as order dependencies, enables further substantial performance improvements. However, such…

数据库 · 计算机科学 2026-04-23 Daniel Lindner , Daniel Ritter , Felix Naumann

Demonstrating Online Schema Alignment in Decentralized Knowledge Graphs Querying

Decentralized Knowledge Graphs querying enables integrating distributed data without centralization, but is highly sensitive to vocabulary heterogeneity. Query issuers cannot realistically anticipate all vocabulary mismatches, especially…

数据库 · 计算机科学 2026-04-22 Bryan-Elliott Tam , Pieter Colpaert , Ruben Taelman

LIVE: Learnable Monotonic Vertex Embedding for Efficient Exact Subgraph Matching (Technical Report)

Exact subgraph matching is a fundamental graph operator that supports many graph analytics tasks, yet it remains computationally challenging due to its NP-completeness. Recent learning-based approaches accelerate query processing via…

数据库 · 计算机科学 2026-04-22 Yutong Ye , Weilong Ren , Yang Liu , Mengyi Yan , Ruijie Wang , Li Sun , Jianxin Li , Philip S. Yu

Heuristic Search Space Partitioning for Low-Latency Multi-Tenant Cloud Queries

Large-scale cloud security platforms must continuously query millions of structured cloud resource records distributed across thousands of tenant accounts. Broad, account-spanning queries saturate database infrastructure, producing P95…

数据库 · 计算机科学 2026-04-22 Prashant Kumar Pathak , Chandra Biksheswaran Mouleeswaran , Rama Teja Repaka

The Public Health and Environmental Surveillance Open Data Model (PHES-ODM) Version 3: An Open, Relational Data Model and Interoperability Framework for Wastewater Surveillance

Wastewater surveillance (WWS) has emerged as a valuable tool for public health surveillance, particularly since the COVID-19 pandemic. Its long-term utility is constrained, however, by fragmented data systems, inconsistent metadata…

数据库 · 计算机科学 2026-04-22 Mathew Thomson , Jean-David Therrien , Nikho Hizon , Janet Lin , Martin Wellman , Eugen-Sorin Sion , Carol Bennett , Peter Van Rolleghem , Douglas Manuel

vMODB: Unifying Event and Data Management for Distributed Asynchronous Applications

Event-driven microservice architecture (EDMA) has emerged as a crucial architectural pattern for scalable cloud applications. In typical EDMAs, database systems are relegated to isolated storage engines for individual components, blind to…

数据库 · 计算机科学 2026-04-22 Rodrigo Laigner , Yongluan Zhou

Direct Access for Answers to Conjunctive Queries with Aggregation

We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated…

数据库 · 计算机科学 2026-04-22 Idan Eldar , Nofar Carmeli , Benny Kimelfeld

Validating UTF-8 In Less Than One Instruction Per Byte

The majority of text is stored in UTF-8, which must be validated on ingestion. We present the lookup algorithm, which outperforms UTF-8 validation routines used in many libraries and languages by more than 10 times using commonly available…

数据库 · 计算机科学 2026-04-22 John Keiser , Daniel Lemire

BranchBench: Aligning Database Branching with Agentic Demands

Branchable databases are evolving from developer tools to infrastructure for agentic workloads characterized by speculative mutations and non-linear state exploration. Traditional RDBMS mechanisms such as nested transactions do not provide…

数据库 · 计算机科学 2026-04-21 Elaine Ang , Sam Weldon , In Keun Kim , Kevin Durand , Kostis Kaffes , Eugene Wu

FliX: Flipped-Indexing for Scalable GPU Queries and Updates

GPU-based concurrent data structures (CDSs) achieve high throughput for read-only queries, but efficient support for dynamic updates on fully GPU-resident data remains challenging. Ordered CDSs (e.g., B-trees and LSM-trees) maintain an…

数据库 · 计算机科学 2026-04-21 Rosina Kharal , Trevor Brown , Justus Henneberg , Felix Schuhknecht