数据库 — Scifaro

OptBench: An Interactive Workbench for AI/ML-SQL Co-Optimization[Extended Demonstration Proposal]

Database workloads are increasingly nesting artificial intelligence (AI) and machine learning (ML) pipelines and AI/ML model inferences with data processing, yielding hybrid SQL+AI/ML queries that mix relational operators with expensive,…

数据库 · 计算机科学 2026-03-11 Jaykumar Tandel , Douglas Oscarson , Jia Zou

CEMR: An Effective Subgraph Matching Algorithm with Redundant Extension Elimination

Subgraph matching is a fundamental problem in graph analysis with a wide range of applications. However, due to its inherent NP-hardness, enumerating subgraph matches efficiently on large real-world graphs remains highly challenging. Most…

数据库 · 计算机科学 2026-03-11 Linglin Yang , Xunbin Su , Lei Zou , Xiangyang Gou , Yinnian Lin

Samyama: A Unified Graph-Vector Database with In-Database Optimization, Agentic Enrichment, and Hardware Acceleration

Modern data architectures are fragmented across graph databases, vector stores, analytics engines, and optimization solvers, resulting in complex ETL pipelines and synchronization overhead. We present Samyama, a high-performance…

数据库 · 计算机科学 2026-03-11 Madhulatha Mandarapu , Sandeep Kunkunuru

Modeling Concurrency Control as a Learnable Function

Concurrency control (CC) algorithms are important in modern transactional databases, as they enable high performance by executing transactions concurrently while ensuring correctness. However, state-of-the-art CC algorithms struggle to…

数据库 · 计算机科学 2026-03-11 Hexiang Pan , Shaofeng Cai , Tien Tuan Anh Dinh , Yuncheng Wu , Yeow Meng Chee , Gang Chen , Beng Chin Ooi

Towards Selecting the Informative Alternative Relational Query Plans for Database Education

Off-the-shelf RDBMS typically expose only the query execution plan (QEP) of an SQL query, without presenting information about representative alternative query plans (AQPs) considered during plan selection in a user-friendly manner.…

数据库 · 计算机科学 2026-03-11 Hu Wang , Hui Li , Sourav S Bhowmick , Zihao Ma

Query-Guided Analysis and Mitigation of Data Verification Errors (Extended Version)

Data verification, the process of labeling data items as correct or incorrect, is a preprocessing step that may critically affect the quality of results in data-driven pipelines. Despite recent advances, verification can still produce…

数据库 · 计算机科学 2026-03-10 Ran Schreiber , Yael Amsterdamer

LLM-Driven Online Aggregation for Unstructured Text Analytics

Large Language Models (LLMs) exhibit strong capabilities in text processing, and recent research has augmented SQL and DataFrame with LLM-powered semantic operators for data analysis. However, LLM-based data processing is hindered by slower…

数据库 · 计算机科学 2026-03-10 Chao Hui , Weizheng Lu , Yanjie Gao , Lingfeng Xiong , Yunhai Wang , Yueguo Chen

PRIME: Efficient Algorithm for Token Graph Routing Problem

Optimizing asset exchanges on blockchain-driven platforms poses a novel and challenging graph query optimization problem. In this model, assets represent vertices and exchanges form edges, recasting the graph query task as a routing problem…

数据库 · 计算机科学 2026-03-10 Haotian Xu , Yuqing Zhu , Yuming Huang , Jing Tang

Decomposition-Driven Multi-Table Retrieval and Reasoning for Numerical Question Answering

In this paper, we study the problem of numerical multi-table question answering (MTQA) over large-scale table collections (e.g., online data repositories). This task is essential in many analytical applications. Existing MTQA solutions,…

数据库 · 计算机科学 2026-03-10 Feng Luo , Hai Lan , Hui Luo , Zhifeng Bao , Xiaoli Wang , J. Shane Culpepper , Shazia Sadiq

GP-Tree: An in-memory spatial index combining adaptive grid cells with a prefix tree for efficient spatial querying

Efficient spatial indexing is crucial for processing large-scale spatial data. Traditional spatial indexes, such as STR-Tree and Quad-Tree, organize spatial objects based on coarse approximations, such as their minimum bounding rectangles…

数据库 · 计算机科学 2026-03-10 Xiangyang Yang , Xuefeng Guan , Lanxue Dang , Yi Xie , Qingyang Xu , Huayi Wu , Jiayao Wang

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

Enterprises commonly deploy heterogeneous database systems, each of which owns a distinct SQL dialect with different syntax rules, built-in functions, and execution constraints. However, most existing NL2SQL methods assume a single dialect…

数据库 · 计算机科学 2026-03-10 Xiang Zhang , Hongming Xu , Le Zhou , Wei Zhou , Xuanhe Zhou , Guoliang Li , Yuyu Luo , Changdong Liu , Guorun Chen , Jiang Liao , Fan Wu

LLM-FK: Multi-Agent LLM Reasoning for Foreign Key Detection in Large-Scale Complex Databases

Detecting missing foreign keys (FKs) requires accurately modeling semantic dependencies across database schemas, which conventional heuristic-based methods are fundamentally limited in capturing. We propose LLM-FK, the first fully automated…

数据库 · 计算机科学 2026-03-10 Zijian Tang , Ying Zhang , Sibo Cai , Ruoxuan Wang

Novel Table Search [Technical Report]

Avoiding redundancy in query results has been extensively studied in relational databases and information retrieval, yet its implications for data lakes remain largely unexplored. We bridge this gap by investigating how to discover…

数据库 · 计算机科学 2026-03-10 Besat Kassaie , Renée J. Miller

The Fifth Graph Normal Form (5GNF): A Trait-Based Framework for Metadata Normalization in Property Graphs

Graph databases are widely used in systems that manage rich metadata, yet current modelling practices often embed descriptive attributes directly in nodes, leading to redundancy and inconsistent semantics. This paper introduces the Fifth…

数据库 · 计算机科学 2026-03-10 Yahya Sa'd , Vojtech Merunka , Renzo Angles

A Pipeline for ADNI Resting-State Functional MRI Processing and Quality Control

The Alzheimer's Disease Neuroimaging Initiative (ADNI) provides a comprehensive multimodal neuroimaging resource for studying aging and Alzheimer's disease (AD). Since its second wave, ADNI has increasingly collected resting-state…

数据库 · 计算机科学 2026-03-10 Saige Rutherford , Zeshawn Zahid , Robert C. Welsh , Andrea Avena-Koenigsberger , Vincent Koppelmans , Amanda F. Mejia

Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL

While Text-to-SQL systems achieve high accuracy, existing efficiency metrics like the Valid Efficiency Score prioritize execution time, a metric we show is fundamentally decoupled from consumption-based cloud billing. This paper evaluates…

数据库 · 计算机科学 2026-03-10 Saurabh Deochake , Debajyoti Mukhopadhyay

Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models

Data quality remains an important challenge in data-driven systems, as errors in tabular data can severely compromise downstream analytics and machine learning performance. Although numerous error detection algorithms have been proposed,…

数据库 · 计算机科学 2026-03-10 Xinyuan Liu , Jiahui Chen , Bocheng Hu , Yu Sun , Xinyang Chen , Shaoxu Song , Yongxin Tong

WikiDBGraph: A Data Management Benchmark Suite for Collaborative Learning over Database Silos

Relational databases are often fragmented across organizations, creating data silos that hinder distributed data management and mining. Collaborative learning (CL) -- techniques that enable multiple parties to train models jointly without…

数据库 · 计算机科学 2026-03-10 Zhaomin Wu , Ziyang Wang , Bingsheng He

HEXGEN-FLOW: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL

Recent advances in agentic large language models (LLMs) have substantially improved Text-to-SQL, enabling users without database expertise to query databases intuitively. However, deploying agentic LLM-based Text-to-SQL systems in…

数据库 · 计算机科学 2026-03-10 You Peng , Youhe Jiang , Wenqi Jiang , Chen Wang , Binhang Yuan

Tag-specific Regret Minimization Problem in Outdoor Advertising

Recently, out-of-home advertising has become a popular marketing technique, due to its higher return on investment. E-commerce houses approach the influence provider to achieve effective advertising through their tags (advertising content),…

数据库 · 计算机科学 2026-03-09 Dildar Ali , Abishek Salaria , Ansh Jasrotia , Suman Banerjee