数据库 — Scifaro

Towards Effective Orchestration of AI x DB Workloads

AI-driven analytics are increasingly crucial to data-centric decision-making. The practice of exporting data to machine learning runtimes incurs high overhead, limits robustness to data drift, and expands the attack surface, especially in…

数据库 · 计算机科学 2026-03-05 Naili Xing , Haotian Gao , Zhanhao Zhao , Shaofeng Cai , Zhaojing Luo , Yuncheng Wu , Zhongle Xie , Meihui Zhang , Beng Chin Ooi

GraphLake: A Purpose-Built Graph Compute Engine for Lakehouse

In this paper, we introduce GraphLake, a purpose-built graph compute engine for Lakehouse. GraphLake is built on top of the commercial graph database TigerGraph. It maps Lakehouse tables to vertex and edge types in a labeled property graph…

数据库 · 计算机科学 2026-03-05 Shige Liu , Songting Chen , Chengjie Qin , Mingxi Wu , Jianguo Wang

SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification

Community-driven Text-to-SQL evaluation platforms play a pivotal role in tracking the state of the art of Text-to-SQL performance. The reliability of the evaluation process is critical for driving progress in the field. Current evaluation…

数据库 · 计算机科学 2026-03-05 Rocky Klopfenstein , Yang He , Andrew Tremante , Yuepeng Wang , Nina Narodytska , Haoze Wu

LQRS: Learned Query Re-optimization Framework for Spark SQL

The query optimizer is a fundamental component of database management systems. Recent studies have shown that learned query optimizers outperform traditional cost-based query optimizers. However, they fail to exploit valuable runtime…

数据库 · 计算机科学 2026-03-05 Jiahao He , Yutao Cui , Cuiping Li , Jikang Jiang , Yuheng Hou , Hong Chen

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To…

数据库 · 计算机科学 2026-03-05 Shige Liu , Zhifang Zeng , Li Chen , Adil Ainihaer , Arun Ramasami , Songting Chen , Yu Xu , Mingxi Wu , Jianguo Wang

Virtual-Memory Assisted Buffer Management In Tiered Memory

Tiered memory architectures have gained significant traction in the database community in recent years. In these architectures, the on-chip DRAM of the host processor is typically referred to as local memory, and forms the primary tier.…

数据库 · 计算机科学 2026-03-04 Yeasir Rayhan , Walid G. Aref

Cross-Layer Decision Timing Orchestration in Cost-Based Database Systems: Resolving Structural Temporal Misalignment

This paper analyzes execution instability in traditional cost-based database management systems (DBMS) and identifies a structural timing misalignment between optimization and execution stages that contributes to tail-latency amplification.…

数据库 · 计算机科学 2026-03-04 Ilsun Chang

HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval

Table-text retrieval aims to retrieve relevant tables and text to support open-domain question answering. Existing studies use either early or late fusion, but face limitations. Early fusion pre-aligns a table row with its associated…

数据库 · 计算机科学 2026-03-04 Sungho Park , Joohyung Yun , Jongwuk Lee , Wook-Shin Han

GLEAN: Grounded Lightweight Evaluation Anchors for Contamination-Aware Tabular Reasoning

Tabular reasoning benchmarks mix semantic inference, numerical computation, and brittle table formatting, yet evaluations for small models remain vulnerable to contamination, dataset artifacts, and retrieval failures. We propose GLEAN, a…

数据库 · 计算机科学 2026-03-04 Qizhi Wang

LinkML: An Open Data Modeling Framework

Scientific research relies on well-structured, standardized data; however, much of it is stored in formats such as free-text lab notebooks, non-standardized spreadsheets, or data repositories. This lack of structure challenges…

数据库 · 计算机科学 2026-03-04 Sierra A. T. Moxon , Harold Solbrig , Nomi L. Harris , Patrick Kalita , Mark A. Miller , Sujay Patil , Kevin Schaper , Chris Bizon , J. Harry Caufield , Silvano Cirujano Cuesta , Corey Cox , Frank Dekervel , Damion M. Dooley , William D. Duncan , Tim Fliss , Sarah Gehrke , Adam S. L. Graefe , Harshad Hegde , AJ Ireland , Julius O. B. Jacobsen , Madan Krishnamurthy , Carlo Kroll , David Linke , Ryan Ly , Nicolas Matentzoglu , James A. Overton , Jonny L. Saunders , Deepak R. Unni , Gaurav Vaidya , Wouter-Michiel A. M. Vierdag , LinkML Community Contributors , Oliver Ruebel , Christopher G. Chute , Matthew H. Brush , Melissa A. Haendel , Christopher J. Mungall

Catapults to the Rescue: Accelerating Vector Search by Exploiting Query Locality

Graph-based indexing is the dominant approach for approximate nearest neighbor search in vector databases, offering high recall with low latency across billions of vectors. However, in such indices, the edge set of the proximity graph is…

数据库 · 计算机科学 2026-03-03 Sami Abuzakuk , Anne-Marie Kermarrec , Rafael Pires , Mathis Randl , Martijn de Vos

Milliscale: Fast Commit on Low-Latency Object Storage

With millisecond-level latency and support for mutable objects, recent low-latency object storage services as represented by Amazon S3 Express One Zone have become an attractive option for OLTP engines to directly commit transactions and…

数据库 · 计算机科学 2026-03-03 Jiatang Zhou , Kaisong Huang , Tianzheng Wang

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Traditional query processing relies on engines that are carefully optimized and engineered by many experts. However, new techniques and user requirements evolve rapidly, and existing systems often cannot keep pace. At the same time, these…

数据库 · 计算机科学 2026-03-03 Jiale Lao , Immanuel Trummer

Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database Engines

Modern OLAP engines are designed to support arbitrary analytical workloads, but this generality incurs structural overhead, including runtime schema interpretation, indirection layers, and abstraction boundaries, even in highly optimized…

数据库 · 计算机科学 2026-03-03 Johannes Wehrstein , Timo Eckmann , Matthias Jasny , Carsten Binnig

Disk-Resident Graph ANN Search: An Experimental Evaluation

As data volumes grow while memory capacity remains limited, disk-resident graph-based approximate nearest neighbor (ANN) methods have become a practical alternative to memory-resident designs, shifting the bottleneck from computation to…

数据库 · 计算机科学 2026-03-03 Xiaoyu Chen , Jinxiu Qu , Yitong Song , Shuhang Lu , Huiling Li , Minghui Jiang , Wei Zhou , Jianliang Xu , Xuanhe Zhou , Fan Wu

Adversarial Query Synthesis via Bayesian Optimization

Benchmark workloads are extremely important to the database management research community, especially as more machine learning components are integrated into database systems. Here, we propose a Bayesian optimization technique to…

数据库 · 计算机科学 2026-03-03 Jeffrey Tao , Yimeng Zeng , Haydn Thomas Jones , Natalie Maus , Osbert Bastani , Jacob R. Gardner , Ryan Marcus

VectorMaton: Efficient Vector Search with Pattern Constraints via an Enhanced Suffix Automaton

Approximate nearest neighbor search (ANNS) has become a cornerstone in modern vector database systems. Given a query vector, ANNS retrieves the closest vectors from a set of base vectors. In real-world applications, vectors are often…

数据库 · 计算机科学 2026-03-03 Haoxuan Xie , Siqiang Luo

A Framework for Transparent Reporting of Data Quality Analysis Across the Clinical Electronic Health Record Data Lifecycle

Data quality (DQ) and transparency of secondary data are critical factors that delay the adoption of clinical AI models and affect clinician trust in them. Many DQ studies fail to clarify where, along the lifecycle, quality checks occur,…

数据库 · 计算机科学 2026-03-03 Melinda Wassell , Kerryn Butler-Henderson , Karin Verspoor

A Tree-Structured Two-Phase Commit Framework for OceanBase: Optimizing Scalability and Consistency

Modern distributed databases face challenges in achieving transactional consistency across distributed partitions. Traditional two-phase commit (2PC) protocols incur high coordination overhead and latency, and require complex recovery for…

数据库 · 计算机科学 2026-03-03 Quanqing Xu , Chen Qian , Chuanhui Yang , Fanyu Kong , Guixiang Liu , Fusheng Han , Zixiang Zhai

COLE$^+$: Towards Practical Column-based Learned Storage for Blockchain Systems

Blockchain provides a decentralized and tamper-resistant ledger for securely recording transactions across a network of untrusted nodes. While its transparency and integrity are beneficial, the substantial storage requirements for…

数据库 · 计算机科学 2026-03-03 Ce Zhang , Cheng Xu , Haibo Hu , Jianliang Xu