数据库 — Scifaro

A GPU-Accelerated Framework for Multi-Attribute Range Filtered Approximate Nearest Neighbor Search

Range-filtered approximate nearest neighbor search (RFANNS) is increasingly critical for modern vector databases. However, existing solutions suffer from severe index inflation and construction overhead. Furthermore, they rely exclusively…

数据库 · 计算机科学 2026-04-28 Zhonggen Li , Haoran Yu , Zixuan Xu , Yifan Zhu , Yunjun Gao

PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

Recent database systems have introduced semantic operators that leverage large language models (LLMs) to filter, join, and project over structured data using natural language predicates. In practice, these operators are combined with…

数据库 · 计算机科学 2026-04-28 Qiuyang Mang , Yufan Xiang , Hangrui Zhou , Runyuan He , Jiaxiang Yu , Hanchen Li , Aditya Parameswaran , Alvin Cheung

Descriptor: Multi-Regional Cloud Honeypot Dataset (MURHCAD)

This data article introduces a comprehensive, high-resolution honeynet dataset designed to support standalone analyses of global cyberattack behaviors. Collected over a continuous 72-hour window (June 9 to 11, 2025) on Microsoft Azure, the…

数据库 · 计算机科学 2026-04-28 Enrique Feito-Casares , Ismael Gómez-Talal , José-Luis Rojo-Álvarez

Efficient Mining of Low-Utility Sequential Patterns

Discovering valuable insights from rich data is a crucial task for exploratory data analysis. Sequential pattern mining (SPM) has found widespread applications across various domains. In recent years, low-utility sequential pattern mining…

数据库 · 计算机科学 2026-04-28 Jian Zhu , Zhidong Lin , Wensheng Gan , Philip S. Yu

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Knowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype…

数据库 · 计算机科学 2026-04-28 Anas Dorbani , Sunny Yasser , Jimmy Lin , Amine Mhedhbi

Time travel for knowledge graphs: live queries over RDF change histories

Performing time-traversal queries on RDF datasets remains unsupported in the most extensive knowledge graphs. Existing solutions either require offline ingestion, which prevents concurrent querying and updating, or operate live but with…

数据库 · 计算机科学 2026-04-28 Arcangelo Massari , Silvio Peroni

A dataset of early blockchain-registered AI agents on Ethereum

This study presents a structured dataset of blockchain-registered artificial intelligence agents under the ERC-8004 standard on Ethereum. The dataset integrates on-chain identity records, minting transactions, transfer events, reputation…

数据库 · 计算机科学 2026-04-27 Yulin Liu

It's Time to Standardize RDF Messages

RDF-based systems increasingly operate in event-driven and streaming settings, where producers and consumers exchange data as discrete units of communication rather than as freely mergeable RDF statements. As existing RDF semantics and…

数据库 · 计算机科学 2026-04-27 Pieter Colpaert , Piotr Sowinski

How Hard is it to Decide if a Fact is Relevant to a Query?

We consider the following fundamental problem: given a database D, Boolean conjunctive query (CQ) q, and fact f in D, decide whether f is relevant to q wrt. D, i.e., does f belong to a minimal subset S of D such that S |= q. Despite being…

数据库 · 计算机科学 2026-04-27 Meghyn Bienvenu , Diego Figueira , Pierre Lafourcade

A Model-Driven Approach to Database Migration with a Unified Data Model

Database migration is a key task in software modernization, increasingly involving transformations across heterogeneous data models such as relational and NoSQL systems. Existing approaches are typically designed for specific source-target…

数据库 · 计算机科学 2026-04-27 María J. Ortín , José R. Hoyos , Jesus García-Molina

MCI: A Maximal Clique Index for Efficient Arbitrary-Filtered Approximate Nearest Neighbor Search

Approximate Nearest Neighbor Search with arbitrary filtering predicates (AFANNS) is essential for modern data applications, yet existing methods often incur substantial storage and computational costs. In this work, we introduce the Maximal…

数据库 · 计算机科学 2026-04-27 Xiaowei Ye , Rong-Hua Li , Guoren Wang , Kaiwen Xue , Daiyin Wang , Xubin Li

Implementation and Privacy Guarantees for Scalable Keyword Search on SOLID-based Decentralized Data with Granular Visibility Constraints

In decentralized personal data ecosystems grounded in architectures such as Solid, users retain sovereignty over their data via personal online data stores (pods), hosted on Solid-compliant server infrastructures. In such environments, data…

数据库 · 计算机科学 2026-04-27 Mohamed Ragab , Faria Ferooz , Mohammad Bahrani , Helen Oliver , Thanassis Tiropanis , Alexandra Poulovassilis , Adriane Chapman , George Roussos

SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing

Large language models with long context windows can answer complex questions directly from full-length academic, technical, and policy documents, but passing entire documents is often costly, slow, and can degrade answer quality while…

数据库 · 计算机科学 2026-04-27 Xinzhi Wang , Peter Baile Chen , Gerardo Vitagliano , Matthew Russo , Jun Chen , Michael Cafarella , Samuel Madden , Chunwei Liu

LLM+Graph@VLDB'2025 Workshop Summary

The integration of large language models (LLMs) with graph-structured data has become a pivotal and fast evolving research frontier, drawing strong interest from both academia and industry. The 2nd LLM+Graph Workshop, co-located with the…

数据库 · 计算机科学 2026-04-27 Yixiang Fang , Arijit Khan , Tianxing Wu , Da Yan , Shu Wang

Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions

The task of building a natural language interface to a database, known as NLIDB, has recently gained significant attention from both the database and Natural Language Processing (NLP) communities. With the proliferation of geospatial…

数据库 · 计算机科学 2026-04-27 Samya Acharja , Kanchan Chowdhury

Hierarchical Decomposition of Separable Workflow-Nets

The Partially Ordered Workflow Language (POWL) has recently emerged as a process modeling notation, offering strong quality guarantees and high expressiveness. While early versions of POWL relied on strict block-structured operators for…

数据库 · 计算机科学 2026-04-27 Humam Kourani , Gyunam Park , Wil M. P. van der Aalst

TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models

The SQL-based exploratory data analysis has garnered significant attention within the data analysis community. The emergence of large language models (LLMs) has facilitated the paradigm shift from manual to automated data exploration.…

数据库 · 计算机科学 2026-04-27 Jun-Peng Zhu , Boyan Niu , Peng Cai , Zheming Ni , Kai Xu , Jiajun Huang , Shengbo Ma , Bing Wang , Xuan Zhou , Guanglei Bao , Donghui Zhang , Liu Tang , Qi Liu

HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads

Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict…

数据库 · 计算机科学 2026-04-27 Xinyi Zhang , Liang Liang , Anastasia Ailamaki , Jianliang Xu

An Alternate Agentic AI Architecture (It's About the Data)

For the last several years, the dominant narrative in "agentic AI" has been that large language models should orchestrate information access by dynamically selecting tools, issuing sub-queries, and synthesizing results. We argue this…

数据库 · 计算机科学 2026-04-24 Fabian Wenz , Felix Treutwein , Kai Arenja , Çagatay Demiralp , Michael Stonebraker

Scaling Worst-Case Optimal Datalog to GPUs

Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, marrying logic-defined specification with…

数据库 · 计算机科学 2026-04-24 Yihao Sun , Kunting Qi , Thomas Gilray , Sidharth Kumar , Kristopher Micinski