数据库 — Scifaro

Accelerating Large-Scale Cheminformatics Using a Byte-Offset Indexing Architecture for Terabyte-Scale Data Integration

The integration of large-scale chemical databases represents a critical bottleneck in modern cheminformatics research, particularly for machine learning applications requiring high-quality, multi-source validated datasets. This paper…

数据库 · 计算机科学 2026-03-23 Malikussaid , Septian Caesar Floresko , Sutiyo

Database Theory in Action: Direct Access to Query Answers

Direct access asks for the retrieval of query answers by their ranked position, given a query and a desired order. While the time complexity of data structures supporting such accesses has been studied in depth, and efficient algorithms for…

数据库 · 计算机科学 2026-03-23 Jiayin Hu , Nikolaos Tziavelis

Condensed Representation for Snapshot-Based RDF Graphs

Evolving phenomena, often complex, can be represented using knowledge graphs, which have the capability to model heterogeneous data from multiple sources. Nowadays, a considerable amount of sources delivering periodic updates to knowledge…

数据库 · 计算机科学 2026-03-23 Jey Puget Gil , Emmanuel Coquery , John Samuel , Gilles Gesquiere

Let's Play Tag: Linear Time Evaluation of Conjunctive Queries under TGD Constraints

We study the limits of linear time evaluation of conjunctive queries under constraints expressed as tuple-generating dependencies (TGDs), across several modes of query evaluation: single-testing, all-testing, counting, lexicographic direct…

数据库 · 计算机科学 2026-03-20 Nofar Carmeli , Carsten Lutz , Marcin Przybyłko

QuaQue: Design and SQL Implementation of Condensed Algebra for Concurrent Versioning of Knowledge Graphs

The management of versioned knowledge graphs presents significant challenges, particularly in querying data across multiple versions efficiently. This paper introduces QuaQue, a key component of the ConVer-G system, which addresses this…

数据库 · 计算机科学 2026-03-20 Jey Puget Gil , Emmanuel Coquery , John Samuel , Gilles Gesquière

SODIUM: From Open Web Data to Queryable Databases

During research, domain experts often ask analytical questions whose answers require integrating data from a wide range of web sources. Thus, they must spend substantial effort searching, extracting, and organizing raw data before analysis…

数据库 · 计算机科学 2026-03-20 Chuxuan Hu , Philip Li , Maxwell Yang , Daniel Kang

SIMD-PAC-DB: Pretty Performant PAC Privacy

This work presents a highly optimized implementation of PAC-DB, a recent and promising database privacy model. We prove that our SIMD-PAC-DB can compute the same privatized answer with just a single query, instead of the 128 stochastic…

数据库 · 计算机科学 2026-03-20 Ilaria Battiston , Dandan Yuan , Xiaochen Zhu , Peter Boncz

DP-S4S: Accurate and Scalable Select-Join-Aggregate Query Processing with User-Level Differential Privacy

Answering Select-Join-Aggregate queries with DP is a fundamental problem with important applications in various domains. The current SOTA methods ensure user-level DP (i.e., the adversary cannot infer the presence or absence of any given…

数据库 · 计算机科学 2026-03-20 Yuan Qiu , Xiaokui Xiao , Yin Yang

A New Lower Bounding Paradigm and Tighter Lower Bounds for Elastic Similarity Measures

Elastic similarity measures are fundamental to time series similarity search because of their ability to handle temporal misalignments. These measures are inherently computationally expensive, therefore necessitating the use of lower bounds…

数据库 · 计算机科学 2026-03-20 Zemin Chao , Boyu Xiao , Zitong Li , Zhixin Qi , Xianglong Liu , Hongzhi Wang

Multiverse: Transactional Memory with Dynamic Multiversioning

Software transactional memory (STM) allows programmers to easily implement concurrent data structures. STMs simplify atomicity. Recent STMs can achieve good performance for some workloads but they have some limitations. In particular, STMs…

数据库 · 计算机科学 2026-03-20 Gaetano Coccimiglio , Trevor Brown , Srivatsan Ravi

LLMIA: An Out-of-the-Box Index Advisor via In-Context Learning with LLMs

Index recommendation is crucial for optimizing database performance. However, existing heuristic- and learning-based methods often rely on inefficient exhaustive search and estimated costs, leading to low efficiency (due to the vast search…

数据库 · 计算机科学 2026-03-20 Xinxin Zhao , Xinmei Huang , Haoyang Li , Jing Zhang , Shuai Wang , Tieying Zhang , Jianjun Chen , Rui Shi , Cuiping Li , Hong Chen

Halo: Domain-Aware Query Optimization for Long-Context Question Answering

Long-context question answering (QA) over lengthy documents is critical for applications such as financial analysis, legal review, and scientific research. Current approaches, such as processing entire documents via a single LLM call or…

数据库 · 计算机科学 2026-03-19 Pramod Chunduri , Francisco Romero , Ali Payani , Kexin Rong , Joy Arulraj

On the generic information capacity of relational schemas with a single binary relation

We consider database schemas consisting of a single binary relation, with key constraints and inclusion dependencies. Over this space of 20 schemas, we completely characterize when one schema is generically dominated by another schema.…

数据库 · 计算机科学 2026-03-19 Benoît Groz , Jan Hidders , Nina Pardal , Jan Van den Bussche , Piotr Wieczorek

Efficient and Effective Table-Centric Table Union Search in Data Lakes

In data lakes, information on the same subject is often fragmented across multiple tables. Table union search aims to find the top-k tables that can be unioned with a query table to extend it with more rows, without relying on metadata or…

数据库 · 计算机科学 2026-03-19 Yongkang Sun , Zhihao Ding , Huiqiang Wang , Reynold Cheng , Jieming Shi

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Semantic operators abstract large language model (LLM) calls in SQL clauses. It is gaining traction as an easy method to analyze semi-structured, unstructured, and multimodal datasets. While a plethora of recent works optimize various…

数据库 · 计算机科学 2026-03-19 Jason Shin , Jiwon Chang , Fatemeh Nargesian

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. We challenge this assumption with cache semantics,…

数据库 · 计算机科学 2026-03-19 Haidong Rong , Jiashu Yao , Matthias Langer , Shijie Liu , Li Fan , Dongxin Wang , Jia He , Jinglin Chen , Jiaheng Rang , Julian Qian , Mengyao Xu , Fan Yu , Minseok Lee , Zehuan Wang , Even Oldridge

Open Biomedical Knowledge Graphs at Scale: Construction, Federation, and AI Agent Access with Samyama Graph Database

Biomedical knowledge is fragmented across siloed databases -- Reactome for pathways, STRING for protein interactions, ClinicalTrials.gov for study registries, DrugBank for drug vocabularies, DGIdb for drug-gene interactions, SIDER for side…

数据库 · 计算机科学 2026-03-19 Madhulatha Mandarapu , Sandeep Kunkunuru

How to Write to SSDs

This paper demonstrates that adopting out-of-place writes is essential for database systems to fully leverage SSD performance and extend SSD lifespan. We propose a set of out-of-place optimizations that collectively reduce write…

数据库 · 计算机科学 2026-03-19 Bohyun Lee , Tobias Ziegler , Viktor Leis

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. In the AI era, data analysis increasingly involves large-scale,…

数据库 · 计算机科学 2026-03-19 Jean-Daniel Fekete , Yifan Hu , Dominik Moritz , Arnab Nandi , Senjuti Basu Roy , Eugene Wu , Nikos Bikakis , George Papastefanatos , Panos K. Chrysanthis , Guoliang Li , Lingyun Yu

ORCA: ORchestrating Causal Agent

Causal analysis on relational databases is challenging, as analysis datasets must be repeatedly queried from complex schemas. Recent LLM systems can automate individual steps, but they hardly manage dependencies across analysis stages,…

数据库 · 计算机科学 2026-03-19 Joanie Hayoun Chung , Sumin Lee , Sungbin Lim