数据库 — Scifaro

Should I Hide My Duck in the Lake?

Data lakes spend a significant fraction of query execution time on scanning data from remote, disaggregated storage. Decoding alone accounts for 46% of runtime when running TPC-H directly on Parquet files. To address this bottleneck, we…

数据库 · 计算机科学 2026-05-06 Jonas Dann , Gustavo Alonso

Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery

Modern database management systems (DBMS) face significant challenges in maintaining performance and availability under dynamic workloads. This paper proposes a novel self-healing framework that integrates Model-Agnostic Meta-Learning…

数据库 · 计算机科学 2026-05-06 Joydeep Chandra , Prabal Manhas

WhaleVis: Visualizing the History of Commercial Whaling

Whales are an important part of the oceanic ecosystem. Although historic commercial whale hunting a.k.a. whaling has severely threatened whale populations, whale researchers are looking at historical whaling data to inform current whale…

数据库 · 计算机科学 2026-05-06 Ameya Patil , Zoe Rand , Trevor Branch , Leilani Battle

Static Type Checking for Database Access Code

JDBC remains a key technology for database access in Java applications. Since the database dictionary and the Java type system have distinct scopes, developers inevitably need to deal with bugs in SQL-to-Java type mappings. We propose an…

数据库 · 计算机科学 2026-05-05 Thomas James Kirz , Werner Dietl , Mattias Ulbrich , Stefanie Scherzinger

Unfair by design: eBPF-based scheduling of mixed database workloads

Modern database systems increasingly co-schedule time-sensitive and background tasks. In such mixed workloads, background tasks should ideally utilize only spare CPU capacity without interfering with latency-critical requests. While some…

数据库 · 计算机科学 2026-05-05 Carl-Elliott Bilodeau-Savaria , Jan Kristof Nidzwetzki , Stefanie Scherzinger , Bettina Kemme

Actionable Understanding: Action Units for Bridging the Knowledge-Action Gap in Post-FAIR Knowledge Infrastructures

Despite unprecedented growth in biodiversity data, a persistent gap remains between what is known and what is acted upon. Existing frameworks such as the FAIR and CLEAR Principles have improved data accessibility and interpretability but do…

数据库 · 计算机科学 2026-05-05 Lars Vogt

Write-Read Decoupling in Modern Large-Scale Search Engines: Architectures, Techniques, and Emerging Approaches

Large-scale search engines face a fundamental tension: the index must be updated frequently to maintain freshness, yet updates create resource contention that inflates query latency. In the dominant Lucene-based architecture, segment merges…

数据库 · 计算机科学 2026-05-05 Xin Liang , Qing Yang , Wenru Qiu , Wenjie Mao , Tianyu Ma , Minghui Zhu , Nan Wang

Graph Query Generation with Constraint-guided Large Language Agents

Knowledge Graph Question Answering (KGQA) has advanced through structured query generation, yet most efforts target RDF/SPARQL, leaving Cypher and property graphs underexplored, despite increasing demand for unified KGQA in industry…

数据库 · 计算机科学 2026-05-05 Mengying Wang , Nicolaas Jedema , Rahul Pandey , RaviKiran Krishnan , Jens Lehmann , Yinghui Wu

BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result Collector

Although Approximate Nearest Neighbor (ANN) search has been extensively studied, large-k ANN queries that aim to retrieve a large number of nearest neighbors remain underexplored, despite their numerous real-world applications. Existing ANN…

数据库 · 计算机科学 2026-05-05 Ziqi Yin , Gao Cong , Kai Zeng , Jinwei Zhu , Bin Cui

Enzyme: Incremental View Maintenance for Data Engineering

Materialized views are a core construct in database systems, used to accelerate analytical queries and optimize batch pipelines for extract-transform-load (ETL) workflows. Maintaining view consistency as underlying data evolves is a…

数据库 · 计算机科学 2026-05-05 Ritwik Yadav , Supun Abeysinghe , Min Yang , Jeffrey Helt , Manuel Ung , Yuhong Chen , Melody Hu , William Wei , Yiming Yang , Tom van Bussel , Sourav Chatterji , Indrajit Roy , Paul Lappas , Yannis Papakonstantinou , Tahir Fayyaz , Bilal Aslam , Ross Bunker , Michael Armbrust , Shrikanth Shankar

RNSG: A Range-Aware Graph Index for Efficient Range-Filtered Approximate Nearest Neighbor Search

Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query…

数据库 · 计算机科学 2026-05-05 Zhiqiu Zou , Ziqi Yin , Rong-Hua Li , Hongchao Qin , Qiangqiang Dai , Guoren Wang

ConStruM: A Structure-Guided LLM Framework for Context-Aware Schema Matching

Column matching is a central task in reconciling schemas for data integration. Column names and descriptions are valuable for this task. LLMs can leverage such natural-language schema metadata. However, in many datasets, correct matching…

数据库 · 计算机科学 2026-05-05 Houming Chen , Zhe Zhang , H. V. Jagadish

Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and can be utilized by users without…

数据库 · 计算机科学 2026-05-05 Nils Strassenburg , Boris Glavic , Tilmann Rabl

MINT: Multi-Vector Search Index Tuning

Vector search plays a crucial role in many real-world applications. In addition to single-vector search, multi-vector search becomes important for multi-modal and multi-feature scenarios today. In a multi-vector database, each row is an…

数据库 · 计算机科学 2026-05-05 Jiongli Zhu , Yue Wang , Bailu Ding , Philip A. Bernstein , Vivek Narasayya , Surajit Chaudhuri

Within-Dataset Disclosure Risk for Differential Privacy

Differential privacy (DP) enables private data analysis. In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy. They do so by…

数据库 · 计算机科学 2026-05-05 Zhiru Zhu , Raul Castro Fernandez

Complete Integration of Team Project-based Learning into a Database Syllabus

Team project-based learning (TPBL) combines two learning techniques: project-based learning (PBL) and teamwork. This combination leverages the learning outcomes of both methods and places students in a real work situation where they must…

数据库 · 计算机科学 2026-05-04 S. Iserte , V. R. Tomas , M. Pérez , M. Castillo , P. Boronat , L. A. García

Living Databases: A Unified Model for Continuous Schema Evolution, Versioning, and Transformations

Databases, and datasets more generally, evolve continuously through updates, transformations, versioning, schema changes, streaming operations, and other mechanisms. While prior work has noted connections among some of these areas, they…

数据库 · 计算机科学 2026-05-04 Amol Deshpande

EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement

Text-to-SQL enables non-expert users to query databases in natural language, yet real-world schemas often suffer from ambiguous, abbreviated, or inconsistent naming conventions that degrade model accuracy. Existing approaches treat schemas…

数据库 · 计算机科学 2026-05-04 Jiaqian Wang , Yutao Qi , Wenjin Hou , Yu Pang , Rui Yang

Multiset semantics in SPARQL, Relational Algebra and Datalog

The paper analyzes and characterizes the algebraic and logical structure of the multiset semantics for SPARQL patterns involving AND, UNION, FILTER, EXCEPT, and SELECT. To do this, we align SPARQL with two well-established query languages:…

数据库 · 计算机科学 2026-05-04 Renzo Angles , Claudio Gutierrez , Daniel Hernández

SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms

Big data platforms are widely used in modern enterprises, and an in-production intelligent assistant is increasingly important to help users quickly find actionable guidance and reduce operational burden. While recent LLM+RAG assistants…

数据库 · 计算机科学 2026-05-04 Yu Shen , Shiyang Liu , Qihang He , Yihang Cheng , Haining Xie , Zhiming He , Huahua Fan , Xianzhi Tan , Teng Ma , Shaoquan Zhang , Danqing Huang , Fan Jiang , Yang Li , Chongqing Zhao , Peng Chen , Jie Jiang , Bin Cui