数据库 — Scifaro

Semijoins of Annotated Relations

The semijoin operation is a fundamental operation of relational algebra that has been extensively used in query processing. Furthermore, semijoins have been used to formulate desirable properties of acyclic schemas; in particular, a schema…

数据库 · 计算机科学 2026-03-03 Phokion G. Kolaitis

S$^3$GND: An Effective Learning-Based Approach for Subgraph Similarity Search Under Generalized Neighbor Difference Semantics (Technical Report)

Subgraph similarity search over large-scale graphs is a fundamental task that retrieves subgraphs similar to a given query graph from a data graph, and it plays a crucial role in real applications such as protein discovery, social network…

数据库 · 计算机科学 2026-03-03 Qi Wen , Xiang Lian , Nan Zhang , Yutong Ye , Mingsong Chen

Beyond Single-Modal Analytics: A Framework for Integrating Heterogeneous LLM-Based Query Systems for Multi-Modal Data

With the increasing use of multi-modal data, semantic query has become more and more demanded in data management systems, which is an important way to access and analyze multi-modal data. As unstructured data, most information of…

数据库 · 计算机科学 2026-03-03 Ruyu Li , Tinghui Zhang , Haodi Ma , Daisy Zhe Wang , Yifan Wang

Gen-DBA: Generative Database Agents

Leveraging Machine Learning to optimize database systems, referred to as Machine Learning for Databases (ML4DB, for short), dates back to the early 1990s, spanning indexing techniques, selectivity estimation, and query optimization.…

数据库 · 计算机科学 2026-03-03 Yeasir Rayhan , Walid G. Aref

ReSearch: A Multi-Stage Machine Learning Framework for Earth Science Data Discovery

The rapid expansion of Earth Science data from satellite observations, reanalysis products, and numerical simulations has created a critical bottleneck in scientific discovery, namely identifying relevant datasets for a given research…

数据库 · 计算机科学 2026-03-03 Youran Sun , Yixin Wen , Haizhao Yang

BAMG: A Block-Aware Monotonic Graph Index for Disk-Based Approximate Nearest Neighbor Search

Approximate Nearest Neighbor Search (ANNS) over high-dimensional vectors is a foundational problem in databases, where disk I/O often emerges as the dominant performance bottleneck at scale. To accelerate search, graph-based indexes rely on…

数据库 · 计算机科学 2026-03-03 Huiling Li , Xin Huang , Byron Choi , Jianliang Xu

Breaking the Storage-Compute Bottleneck in Billion-Scale ANNS: A GPU-Driven Asynchronous I/O Framework

With the advancement of information retrieval, recommendation systems, and Retrieval-Augmented Generation (RAG), Approximate Nearest Neighbor Search (ANNS) gains widespread applications due to its higher performance and accuracy. While…

数据库 · 计算机科学 2026-03-03 Yang Xiao , Mo Sun , Ziyu Song , Bing Tian , Jie Zhang , Jie Sun , Zeke Wang

SQUiD: Synthesizing Relational Databases from Unstructured Text

Relational databases are central to modern data management, yet most data exists in unstructured forms like text documents. To bridge this gap, we leverage large language models (LLMs) to automatically synthesize a relational database by…

数据库 · 计算机科学 2026-03-03 Mushtari Sadia , Zhenning Yang , Yunming Xiao , Ang Chen , Amrita Roy Chowdhury

Efficiency of Analysis of Transitive Relations using Query-Driven, Ground-and-Solve, and Fact-Driven Inference

Logic rules allow analysis of complex relationships to be expressed easily, especially for transitive relations in critical applications. However, understanding and predicting the efficiency of different inference methods remain…

数据库 · 计算机科学 2026-03-03 Yanhong A. Liu , John Idogun , Scott D. Stoller , Yi Tong

Are Joins over LSM-Trees Ready? Take RocksDB as an Example

LSM-tree-based data stores are widely adopted in industries for their excellent performance. As data scales increase, disk-based join operations become indispensable yet costly for the database, making the selection of suitable join methods…

数据库 · 计算机科学 2026-03-03 Weiping Yu , Fan Wang , Xuwei Zhang , Siqiang Luo

NSHEDB: Noise-Sensitive Homomorphic Encrypted Database Query Engine

Homomorphic encryption (HE) enables computations directly on encrypted data, offering strong cryptographic guarantees for secure and privacy-preserving data storage and query execution. However, despite its theoretical power, practical…

数据库 · 计算机科学 2026-03-02 Boram Jung , Yuliang Li , Hung-Wei Tseng

GPU-Native Approximate Nearest Neighbor Search with IVF-RaBitQ: Fast Index Build and Search

Approximate nearest neighbor search (ANNS) on GPUs is gaining increasing popularity for modern retrieval and recommendation workloads that operate over massive high-dimensional vectors. Graph-based indexes deliver high recall and throughput…

数据库 · 计算机科学 2026-03-02 Jifan Shi , Jianyang Gao , James Xia , Tamás Béla Fehér , Cheng Long

OceanBase Bacchus: a High-Performance and Scalable Cloud-Native Shared Storage Architecture for Multi-Cloud

Although an increasing number of databases now embrace shared-storage architectures, current storage-disaggregated systems have yet to strike an optimal balance between cost and performance. In high-concurrency read/write scenarios,…

数据库 · 计算机科学 2026-03-02 Quanqing Xu , Mingqiang Zhuang , Chuanhui Yang , Quanwei Wan , Fusheng Han , Fanyu Kong , Hao Liu , Hu Xu , Junyu Ye

CACTUSDB: Unlock Co-Optimization Opportunities for SQL and AI/ML Inferences

There is a growing demand for supporting inference queries that combine Structured Query Language (SQL) and Artificial Intelligence / Machine Learning (AI/ML) model inferences in database systems, to avoid data denormalization and transfer,…

数据库 · 计算机科学 2026-03-02 Lixi Zhou , Kanchan Chowdhury , Lulu Xie , Jaykumar Tandel , Hong Guan , Zhiwei Fan , Xinwei Fu , Jia Zou

VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance

Repairing incomplete trajectory data is essential for downstream spatio-temporal applications. Yet, existing repair methods focus solely on reconstruction without documenting the reasoning behind repair decisions, undermining trust in…

数据库 · 计算机科学 2026-03-02 Hengyu Liu , Tianyi Li , Haoyu Wang , Kristian Torp , Tiancheng Zhang , Yushuai Li , Christian S. Jensen

Optimizing SSD-Resident Graph Indexing for High-Throughput Vector Search

Graph-based approximate nearest neighbor search (ANNS) methods (e.g., HNSW) have become the de facto state of the art for their high precision and low latency. To scale beyond main memory, recent out-of-memory ANNS systems leverage SSDs to…

数据库 · 计算机科学 2026-02-27 Weichen Zhao , Yuncheng Lu , Yao Tian , Hao Zhang , Jiehui Li , Minghao Zhao , Yakun Li , Weining Qian

Detecting Logic Bugs of Join Optimizations in DBMS

Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to…

数据库 · 计算机科学 2026-02-26 Xiu Tang , Sai Wu , Dongxiang Zhang , Feifei Li , Gang Chen

RAMSeS: Robust and Adaptive Model Selection for Time-Series Anomaly Detection Algorithms

Time-series data vary widely across domains, making a universal anomaly detector impractical. Methods that perform well on one dataset often fail to transfer because what counts as an anomaly is context dependent. The key challenge is to…

数据库 · 计算机科学 2026-02-26 Mohamed Abdelmaksoud , Sheng Ding , Andrey Morozov , Ziawasch Abedjan

Towards Autonomous Graph Data Analytics with Analytics-Augmented Generation

This paper argues that reliable end-to-end graph data analytics cannot be achieved by retrieval- or code-generation-centric LLM agents alone. Although large language models (LLMs) provide strong reasoning capabilities, practical graph…

数据库 · 计算机科学 2026-02-26 Qiange Wang , Chaoyi Chen , Jingqi Gao , Zihan Wang , Yanfeng Zhang , Ge Yu

RAC: Relation-Aware Cache Replacement for Large Language Models

The scaling of Large Language Model (LLM) services faces significant cost and latency challenges, making effective caching under tight capacity crucial. Existing cache replacement policies, from heuristics to learning-based methods,…

数据库 · 计算机科学 2026-02-26 Yuchong Wu , Zihuan Xu , Wangze Ni , Peng Cheng , Lei Chen , Xuemin Lin , Heng Tao Shen , Kui Ren