数据库 — Scifaro

Cross-level Privacy Preserving Utility Mining

Privacy-preserving utility mining (PPUM) aims to hide sensitive high-utility patterns while preserving the utility of the sanitized database. In practice, however, many datasets are associated with taxonomic information, which makes the…

数据库 · 计算机科学 2026-05-04 Jiahong Cai , Wensheng Gan , Philip S. Yu

Index-Assisted Stratified Sampling for Online Aggregation

Ad-hoc queries over frequently updated data in a flat schema are common in real-time data analysis applications and often require very low latency. Online aggregation can achieve so by providing approximate aggregation answers with…

数据库 · 计算机科学 2026-05-01 Yunnan Yu , Zhuoyue Zhao

Tailwind: A Practical Framework for Query Accelerators

Relational database management systems (RDBMSes) can process general-purpose queries, but often have lower performance compared to custom-built solutions for specific queries. For example, consider a group-by query over a few known groups…

数据库 · 计算机科学 2026-05-01 Geoffrey X. Yu , Ryan Marcus , Tim Kraska

SynSQL: Synthesizing Relational Databases for Robust Evaluation of Text-to-SQL Systems

Evaluating text-to-SQL systems remains largely fragile: correctness is typically judged by executing predicted and gold SQL queries on a single static database, even though the same queries may behave differently under alternative database…

数据库 · 计算机科学 2026-05-01 Mohammadamin Habibollah , Davood Rafiei

Unified Data Discovery across Query Modalities and User Intents

Data discovery - retrieving relevant tables from a data lake in response to user queries - is a fundamental building block for downstream analytics. In practice, data discovery must support different query modalities, including natural…

数据库 · 计算机科学 2026-05-01 Tingting Wang , Shixun Huang , Zhifeng Bao , J. Shane Culpepper , Shazia Sadiq , Volkan Dedeoglu , Reza Arablouei

Graphify: Automated Synthesis of Type-Safe Graph Backends via $O(S)$ GraphQL-to-Gremlin Transpilation

Graph databases offer unparalleled flexibility for managing interconnected data, yet the lack of strict schema enforcement often leads to runtime uncertainties and complex query development. This paper introduces Graphify, an end-to-end…

数据库 · 计算机科学 2026-05-01 Johannes Graf

CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data

Hybrid queries combining high-dimensional vector similarity search with spatio-temporal filters are increasingly critical for modern retrieval-augmented generation (RAG) systems. Existing systems typically handle these workloads by nesting…

数据库 · 计算机科学 2026-05-01 Mingyu Yang , Wentao Li , Wei Wang

GeoBenchr: An Application-Centric Benchmarking Suite for Spatiotemporal Database Platforms

The rapid growth of spatiotemporal data volumes needs to be handled by database systems capable of efficiently managing and querying such data. Existing systems such as PostGIS, SpaceTime, and MobilityDB offer partial solutions but differ…

数据库 · 计算机科学 2026-05-01 Tim C. Rese , Nils Japke , Diana Baumann , Natalie Carl , David Bermbach

QQESPM: A Quantitative and Qualitative Spatial Pattern Matching Algorithm

The Spatial Pattern Matching (SPM) query allows for the retrieval of Points of Interest (POIs) based on spatial patterns defined by keywords and distance criteria. However, it does not consider the connectivity between POIs. In this study,…

数据库 · 计算机科学 2026-05-01 Carlos Minervino , Claudio Campelo , Maxwell Oliveira , Salatiel Silva

PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search

Pivot tables are ubiquitous in data lakes of modern data ecosystems, making accurate schema matching over pivot tables a key prerequisite for data integration. In this paper, we focus on matching for pivot table schema, which is a novel…

数据库 · 计算机科学 2026-04-30 Yunjun Gao , Chuangyu Ouyang , Congcong Ge , Yifan Zhu

Evergreen: Efficient Claim Verification for Semantic Aggregates

With recent semantic query processing engines, semantic aggregation has become a primitive operator, enabling the reduction of a relation into a natural language aggregate using an LLM. However, the resulting semantic aggregate may contain…

数据库 · 计算机科学 2026-04-30 Alexander W. Lee , Benjamin Han , Shayak Sen , Sam Yeom , Ugur Cetintemel , Anupam Datta

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating…

数据库 · 计算机科学 2026-04-30 Yushi Sun , Lei Chen

Mining Negative Sequential Patterns to Improve Viral Genomic Feature Representation and Classification

Viruses represent the most abundant biological entities on Earth and play a pivotal role in microbial ecosystems, yet, as prominent human pathogens, they are closely linked to human morbidity and mortality. Accurate identification of viral…

数据库 · 计算机科学 2026-04-30 Wenxi Zhu , Wensheng Gan , Zhenlian Qi

Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?

Text-to-SQL and Big Data are both extensively benchmarked fields, yet there is limited research that evaluates them jointly. In the real world, Text-to-SQL systems are often embedded with Big Data workflows, such as large-scale data…

数据库 · 计算机科学 2026-04-30 Germán T. Eizaguirre , Lars Tissen , Marc Sánchez-Artigas

VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines

Visual Graph Query Interfaces (VQIs) empower non-programmers to query graph data by constructing visual queries intuitively. Devising efficient technologies in Graph Query Engines (GQEs) for interactive search and exploration has also been…

数据库 · 计算机科学 2026-04-29 Kai Huang , Houdong Liang , Chongchong Yao , Xi Zhao , Yue Cui , Yao Tian , Ruiyuan Zhang , Xiaofang Zhou

A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysis

Text-to-SQL models have significantly improved with the adoption of Large Language Models (LLMs), leading to their increasing use in real-world applications. Although many benchmarks exist for evaluating the performance of text-to-SQL…

数据库 · 计算机科学 2026-04-29 Sepideh Abedini , M. Tamer Özsu

Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

With the development of large language models (LLMs), numerous studies integrate LLMs through operator-like components to enhance relational data processing tasks, e.g., filters with semantic predicates, knowledge-augmented table…

数据库 · 计算机科学 2026-04-29 Yunxiang Su , Tianjing Zeng , Zhongjun Ding , Yin Lin , Rong Zhu , Zhewei Wei , Bolin Ding , Jingren Zhou

BoomHQ: Learning to Boost Multiple Hybrid Queries on Vector DBMSs

Hybrid queries, which combine vector nearest neighbor searches with scalar predicates, represent a fundamental challenge in managing vector databases. Existing methods often restrict the number of vector columns involved or the complexity…

数据库 · 计算机科学 2026-04-28 Ermu Qiu , Tianyi Chen , Jun Gao , Xing Wei , Yaofeng Tu , Yinjun Han , Yang Lin

Exact Mining of Dense Patterns via Direct Evaluation of Local Interval Frequency Using a Sliding Window

Accurately extracting patterns that appear frequently only within specific time intervals, together with their dense intervals, is important in many applications such as understanding seasonal demand and detecting anomalous…

数据库 · 计算机科学 2026-04-28 Taihei Takahashi , Kanata Takayasu , Satoshi Suga , Satoshi Kurihara

DataClaw: An Autonomous Data Agent with Instant Messaging Integration

In daily life, there are many scenarios that people need to tackle data-related tasks, such as filling out forms, analyzing Excel files, and visualize data report. However, the tools available for these tasks often fragment, requiring users…

数据库 · 计算机科学 2026-04-28 Huahang Li , Wentao Hu , Zhuoyue Wan , Chen Jason Zhang , Haoyang Li , Xiaoyong Wei