数据库 — Scifaro

Numerical benchmark for damage identification in Structural Health Monitoring

The availability of a dataset for validation and verification purposes of novel data-driven strategies and/or hybrid physics-data approaches is currently one of the most pressing challenges in the engineering field. Data ownership,…

数据库 · 计算机科学 2026-03-13 Francesca Marafini , Giacomo Zini , Alberto Barontini , Nuno Mendes , Alice Cicirello , Michele Betti , Gianni Bartoli

Sema: A High-performance System for LLM-based Semantic Query Processing

The integration of Large Language Models (LLMs) into data analytics has unlocked powerful capabilities for reasoning over bulk structured and unstructured data. However, existing systems typically rely on either DataFrame primitives, which…

数据库 · 计算机科学 2026-03-13 Kangkang Qi , Dongyang Xie , Wenbo Li , Hao Zhang , Yuanyuan Zhu , Jeffrey Xu Yu , Kangfei Zhao

LHGstore: An In-Memory Learned Graph Storage for Fast Updates and Analytics

Various real-world applications rely on in-memory dynamic graphs that must efficiently handle frequent updates while supporting low-latency analytics on evolving structures. Achieving both objectives remains challenging due to the trade-off…

数据库 · 计算机科学 2026-03-13 Pengpeng Qiao , Zhiwei Zhang , Xinzhou Wang , Zhetao Li , Xiaochun Cao , Yang Cao

PRMB: Benchmarking Reward Models in Long-Horizon CBT-based Counseling Dialogue

Large language models (LLMs) hold potential for mental healthcare applications, particularly in cognitive behavioral therapy (CBT)-based counseling, where reward models play a critical role in aligning LLMs with preferred therapeutic…

数据库 · 计算机科学 2026-03-13 Yougen Zhou , Qin Chen , Ningning Zhou , Jie Zhou , Liang He

Faster Relational Algorithms Using Geometric Data Structures

Optimization tasks over relational data, such as clustering, often suffer from the prohibitive cost of join operations, which are necessary to access the full dataset. While geometric data structures like BBD trees yield fast approximation…

数据库 · 计算机科学 2026-03-13 Aryan Esmailpour , Stavros Sintos

Towards Defect Phase Diagrams: From Research Data Management to Automated Workflows

Defect phase diagrams provide a unified description of crystal defect states for materials design and are central to the scientific objectives of the Collaborative Research Centre (CRC) 1394. Their construction requires the systematic…

数据库 · 计算机科学 2026-03-13 Khalil Rejiba , Sang-Hyeok Lee , Christina Gasper , Martina Freund , Sandra Korte-Kerzel , Ulrich Kerzel

SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors

Sparse vector Maximum Inner Product Search (MIPS) is crucial in multi-path retrieval for Retrieval-Augmented Generation (RAG). Recent inverted index-based and graph-based algorithms have achieved high search accuracy with practical…

数据库 · 计算机科学 2026-03-13 Ruoxuan Li , Xiaoyao Zhong , Jiabao Jin , Peng Cheng , Wangze Ni , Zhitao Shen , Wei Jia , Xiangyu Wang , Heng Tao Shen , Jingkuan Song

CARROT: A Learned Cost-Constrained Retrieval Optimization System for RAG

Large Language Models (LLMs) have demonstrated impressive ability in generation and reasoning tasks but struggle with handling up-to-date knowledge, leading to inaccuracies or hallucinations. Retrieval-Augmented Generation (RAG) mitigates…

数据库 · 计算机科学 2026-03-13 Ziting Wang , Haitao Yuan , Wei Dong , Gao Cong , Feifei Li

Beyond Standard Datacubes: Extracting Features from Irregular and Branching Earth System Data

Earth science datasets are growing rapidly in both volume and structural complexity. They increasingly contain richly labelled data with heterogeneous metadata and complex internal constraints that impose dependencies between variables and…

数据库 · 计算机科学 2026-03-12 Mathilde Leuridan , James Hawkes , Tiago Quintino , Martin Schultz

Pneuma-Seeker: A Relational Reification Mechanism to Align AI Agents with Human Work over Relational Data

When faced with data problems, many data workers cannot articulate their information need precisely enough for software to help. Although LLMs interpret natural-language requests, they behave brittly when intent is under-specified, e.g.,…

数据库 · 计算机科学 2026-03-12 Muhammad Imam Luthfi Balaka , John Hillesland , Kemal Badur , Raul Castro Fernandez

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema…

数据库 · 计算机科学 2026-03-12 Tianshu Zhang , Kun Qian , Siddhartha Sahai , Yuan Tian , Shaddy Garg , Huan Sun , Yunyao Li

HiFIVE: High-Fidelity Vector-Tile Reduction for Interactive Map Exploration

Interactive visualization is a common tool for exploring large open-data repositories, where users quickly explore datasets across diverse domains. When it comes to large-scale spatial data, many existing tools rely on server-side rendering…

数据库 · 计算机科学 2026-03-12 Tarlan Bahadori , Ahmed Eldawy

K-Join: Combining Vertex Covers for Parallel Joins

Significant research effort has been devoted to improving the performance of join processing in the massively parallel computation model, where the goal is to evaluate a query with the minimum possible data transfer between machines.…

数据库 · 计算机科学 2026-03-12 Simon Frisk , Austen Fan , Paraschos Koutris

Categorical Calculus and Algebra for Multi-Model Data

Multi-model databases are designed to store, manage, and query data in various models, such as relational, hierarchical, and graph data, simultaneously. In this paper, we provide a theoretical basis for querying categorical databases. We…

数据库 · 计算机科学 2026-03-12 Jiaheng Lu

Tursio for Credit Unions: Structured Data Search with Automated Context Graphs

Extracting actionable insights from structured databases in regulated industries, such as credit unions, is often hindered by complex schemas, legacy systems, and stringent data governance requirements. We present Tursio, a secure,…

数据库 · 计算机科学 2026-03-12 Shivani Tripathi , Ravi Shetye , Shi Qiao , Alekh Jindal

Expressive Power of Property Graph Constraint Languages

We present the first principled and systematic study of the expressive power of property graph constraint languages, focused on the recent PG-Keys language, set to inform the upcoming revision of the GQL standard. To this end, we position…

数据库 · 计算机科学 2026-03-11 Stefania Dumbrava , Nadime Francis , Victor Marsault , Steven Sailly

Epistemic Closure: Autonomous Mechanism Completion for Physically Consistent Simulation

The integration of Large Language Models (LLMs) into scientific discovery is currently hindered by the Implicit Context problem, where governing equations extracted from literature contain invisible thermodynamic assumptions (e.g.,…

数据库 · 计算机科学 2026-03-11 Yue Wua , Tianhao Su , Rui Hu , Mingchuan Zhao , Shunbo Hu , Deng Pan , Jizhong Huang

Local Stability of Rankings

Rankings play a crucial role in decision-making. However, if minor changes to items significantly alter their rankings, the quality of the decisions being made can be compromised. The stability of ranking is a measure used to assess how…

数据库 · 计算机科学 2026-03-11 Felix S. Campbell , Yuval Moskovitch

No Cliques Allowed: The Next Step Towards BDD/FC Conjecture

This paper addresses one of the fundamental open questions in the realm of existential rules: the conjecture on the finite controllability of bounded derivation depth rule sets (bdd $\Rightarrow$ fc). We take a step toward a positive…

数据库 · 计算机科学 2026-03-11 Lucas Larroque , Piotr Ostropolski-Nalewaja , Michaël Thomazo

The Virtuous Cycle: AI-Powered Vector Search and Vector Search-Augmented AI

Modern AI and vector search are rapidly converging, forming a promising research frontier in intelligent information systems. On one hand, advances in AI have substantially improved the semantic accuracy and efficiency of vector search,…

数据库 · 计算机科学 2026-03-11 Jiuqi Wei , Quanqing Xu , Chuanhui Yang