数据库 — Scifaro

A Decade of Systems for Human Data Interaction

Human-data interaction (HDI) presents fundamentally different challenges from traditional data management. HDI systems must meet latency, correctness, and consistency needs that stem from usability rather than query semantics; failing to…

数据库 · 计算机科学 2026-01-14 Eugene Wu , Yiru Chen , Haneen Mohammed , Zezhou Huang

Cache Coherence Over Disaggregated Memory

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the…

数据库 · 计算机科学 2026-01-14 Ruihong Wang , Jianguo Wang , Walid G. Aref

RAIRS: Optimizing Redundant Assignment and List Layout for IVF-Based ANN Search

IVF is one of the most widely used ANNS (Approximate Nearest Neighbors Search) methods in vector databases. The idea of redundant assignment is to assign a data vector to more than one IVF lists for reducing the chance of missing true…

数据库 · 计算机科学 2026-01-13 Zehai Yang , Shimin Chen

The Complexity of Finding Missing Answer Repairs

We investigate the problem of identifying database repairs for missing tuples in query answers. We show that when the query is part of the input - the combined complexity setting - determining whether or not a repair exists is…

数据库 · 计算机科学 2026-01-13 Jesse Comer , Val Tannen

Vextra: A Unified Middleware Abstraction for Heterogeneous Vector Database Systems

The rapid integration of vector search into AI applications, particularly for Retrieval Augmented Generation (RAG), has catalyzed the emergence of a diverse ecosystem of specialized vector databases. While this innovation offers a rich…

数据库 · 计算机科学 2026-01-13 Chandan Suri , Gursifath Bhasin

Algorithm Support for Graph Databases, Done Right

Graph database query languages cannot express algorithms like PageRank, forcing costly data wrangling, while existing solutions such as algorithm libraries, vertex-centric APIs, and recursive CTEs lack the necessary combination of…

数据库 · 计算机科学 2026-01-13 Daan de Graaf , Robert Brijder , Soham Chakraborty , George Fletcher , Bram van de Wall , Nikolay Yakovets

Reflective Reasoning for SQL Generation

Robust text-to-SQL over complex, real-world databases remains brittle even with modern LLMs: iterative refinement often introduces syntactic and semantic drift, corrections tend to be non-transferable across queries, and naive use of large…

数据库 · 计算机科学 2026-01-13 Isabelle Mohr , Joao Gandarela , John Dujany , Andre Freitas

Curator: Efficient Vector Search with Low-Selectivity Filters

Embedding-based dense retrieval has become the cornerstone of many critical applications, where approximate nearest neighbor search (ANNS) queries are often combined with filters on labels such as dates and price ranges. Graph-based indexes…

数据库 · 计算机科学 2026-01-13 Yicheng Jin , Yongji Wu , Wenjun Hu , Bruce M. Maggs , Jun Yang , Xiao Zhang , Danyang Zhuo

Database Theory in Action: Yannakakis' Algorithm

Yannakakis' seminal algorithm is optimal for acyclic joins, yet it has not been widely adopted due to its poor performance in practice. This paper briefly surveys recent advancements in making Yannakakis' algorithm more practical, in terms…

数据库 · 计算机科学 2026-01-13 Paraschos Koutris , Stijn Vansummeren , Qichen Wang , Yisu Remy Wang , Xiangyao Yu

Database Views as Explanations for Relational Deep Learning

In recent years, there has been significant progress in the development of deep learning models over relational databases, including architectures based on heterogeneous graph neural networks (hetero-GNNs) and heterogeneous graph…

数据库 · 计算机科学 2026-01-13 Agapi Rissaki , Ilias Fountalis , Wolfgang Gatterbauer , Benny Kimelfeld

Frequency-Aware Graph Construction and Search for Dynamic Vector Databases

Approximate Nearest Neighbor Search (ANNS) is a crucial operation in databases and artificial intelligence. While graph-based ANNS methods like HNSW and NSG excel in performance, they assume uniform query distribution. However, in…

数据库 · 计算机科学 2026-01-13 Yifan Zhu , Ruijie Zhao , Zhonggen Li , Baihua Zheng , Zhikun Zhang , Zhaoqiang Chen , Congcong Ge

Reqo: A Comprehensive Learning-Based Cost Model for Robust and Explainable Query Optimization

Although machine learning (ML) shows potential in improving query optimization by generating and selecting more efficient plans, ensuring the robustness of learning-based cost models (LCMs) remains challenging. These LCMs currently lack…

数据库 · 计算机科学 2026-01-13 Baoming Chang , Amin Kamali , Verena Kantere

Enabling Personal Dataflow Sovereignty via Bolt-on Data Escrow

The digital economy is powered by a continuous and massive exchange of personal data. Individuals provide data to platforms in return for services, from social networking and search to health monitoring, entertainment, and access to LLMs.…

数据库 · 计算机科学 2026-01-13 Zhiru Zhu , Raul Castro Fernandez

The Importance of Parameters in Ranking Functions

How important is the weight of a given column in determining the ranking of tuples in a table? To address such an explanation question about a ranking function, we investigate the computation of SHAP scores for column weights, adopting a…

数据库 · 计算机科学 2026-01-12 Christoph Standke , Nikolaos Tziavelis , Wolfgang Gatterbauer , Benny Kimelfeld

RISE: Rule-Driven SQL Dialect Translation via Query Reduction

Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation tools rely on manually-crafted rules,…

数据库 · 计算机科学 2026-01-12 Xudong Xie , Yuwei Zhang , Wensheng Dou , Yu Gao , Ziyu Cui , Jiansen Song , Rui Yang , Jun Wei

Task Cascades for Efficient Unstructured Data Processing

Modern database systems allow users to query or process unstructured text or document columns using LLM-powered functions. Users can express an operation in natural language (e.g., "identify if this review mentions billing issues"), with…

数据库 · 计算机科学 2026-01-12 Shreya Shankar , Sepanta Zeighami , Aditya Parameswaran

Parallel Dynamic Spatial Indexes

Maintaining spatial data (points in two or three dimensions) is crucial and has a wide range of applications, such as graphics, GIS, and robotics. To handle spatial data, many data structures, called spatial indexes, have been proposed,…

数据库 · 计算机科学 2026-01-12 Ziyang Men , Bo Huang , Yan Gu , Yihan Sun

AeroSketch: Near-Optimal Time Matrix Sketch Framework for Persistent, Sliding Window, and Distributed Streams

Many real-world matrix datasets arrive as high-throughput vector streams, making it impractical to store or process them in their entirety. To enable real-time analytics under limited computational, memory, and communication resources,…

数据库 · 计算机科学 2026-01-12 Hanyan Yin , Dongxie Wen , Jiajun Li , Zhewei Wei , Xiao Zhang , Peng Zhao , Zhi-Hua Zhou

QueryGym: Step-by-Step Interaction with Relational Databases

We introduce QueryGym, an interactive environment for building, testing, and evaluating LLM-based query planning agents. Existing frameworks often tie agents to specific query language dialects or obscure their reasoning; QueryGym instead…

数据库 · 计算机科学 2026-01-12 Haritha Ananthakrishnan , Harsha Kokel , Kelsey Sikes , Debarun Bhattacharjya , Michael Katz , Shirin Sohrabi , Kavitha Srinivas

LGTD: Local-Global Trend Decomposition for Season-Length-Free Time Series Analysis

Time series decomposition into trend, seasonal structure, and residual components is a core primitive for downstream analytics such as anomaly detection, change-point detection, and forecasting. However, most existing seasonal-trend…

数据库 · 计算机科学 2026-01-09 Chotanansub Sophaken , Thanadej Rattanakornphan , Piyanon Charoenpoonpanich , Thanapol Phungtua-eng , Chainarong Amornbunchornvej