数据库 — Scifaro

N2E: A General Framework to Reduce Node-Differential Privacy to Edge-Differential Privacy for Graph Analytics

Differential privacy (DP) has been widely adopted to protect sensitive information in graph analytics. While edge-DP, which protects privacy at the edge level, has been extensively studied, node-DP, offering stronger protection for entire…

数据库 · 计算机科学 2025-11-26 Yihua Hu , Hao Ding , Wei Dong

Mobility Stream Processing on NebulaStream and MEOS

The increasing use of Internet-of-Things (IoT) sensors in moving objects has resulted in vast amounts of spatiotemporal streaming data. To analyze this data in situ, real-time spatiotemporal processing is needed. However, current stream…

数据库 · 计算机科学 2025-11-26 Mariana M. Garcez Duarte , Dwi P. A. Nugroho , Georges Tod , Evert Bevernage , Pieter Moelans , Emine Tas , Esteban Zimanyi , Mahmoud Sakr , Steffen Zeuch , Volker Markl

Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization

Multi-modal analytical processing has the potential to transform applications in e-commerce, healthcare, entertainment, and beyond. However, real-world adoption remains elusive due to the limited ability of traditional relational query…

数据库 · 计算机科学 2025-11-26 Junhao Zhu , Lu Chen , Xiangyu Ke , Ziquan Fang , Tianyi Li , Yunjun Gao , Christian S. Jensen

On 10x Better Scalability: KV Stores Scale Up KV Cache

Large language models (LLMs) rely on Key-Value (KV) cache to reduce time-to-first-token (TTFT) latency, but existing disk-based KV cache systems using file-per-object layouts suffer from severe scalability bottlenecks due to file system…

数据库 · 计算机科学 2025-11-26 Weiping Yu , Ye Jiarui , He Mengke , Junfeng Liu , Siqiang Luo

LEANN: A Low-Storage Vector Index

Embedding-based vector search underpins many important applications, such as recommendation and retrieval-augmented generation (RAG). It relies on vector indices to enable efficient search. However, these indices require storing…

数据库 · 计算机科学 2025-11-26 Yichuan Wang , Zhifei Li , Shu Liu , Yongji Wu , Ziming Mao , Yilong Zhao , Xiao Yan , Zhiying Xu , Yang Zhou , Ion Stoica , Sewon Min , Matei Zaharia , Joseph E. Gonzalez

A General Framework for Per-record Differential Privacy

Differential Privacy (DP) is a widely adopted standard for privacy-preserving data analysis, but it assumes a uniform privacy budget across all records, limiting its applicability when privacy requirements vary with data values. Per-record…

数据库 · 计算机科学 2025-11-25 Xinghe Chen , Dajun Sun , Quanqing Xu , Wei Dong

Efficient Partition-based Approaches for Diversified Top-k Subgraph Matching

Subgraph matching is a core task in graph analytics, widely used in domains such as biology, finance, and social networks. Existing top-k diversified methods typically focus on maximizing vertex coverage, but often return results in the…

数据库 · 计算机科学 2025-11-25 Liuyi Chen , Yuchen Hu , Zhengyi Yang , Xu Zhou , Wenjie Zhang , Kenli Li

LLM and Agent-Driven Data Analysis: A Systematic Approach for Enterprise Applications and System-level Deployment

The rapid progress in Generative AI and Agent technologies is profoundly transforming enterprise data management and analytics. Traditional database applications and system deployment are fundamentally impacted by AI-driven tools, such as…

数据库 · 计算机科学 2025-11-25 Xi Wang , Xianyao Ling , Kun Li , Gang Yin , Liang Zhang , Jiang Wu , Annie Wang , Weizhe Wang

HERP: Hardware for Energy Efficient and Realtime DB Search and Cluster Expansion in Proteomics

Database search and clustering are fundamental components of many data analytics problems, such as mass spectrometry-driven proteomics. Traditional full clustering and search algorithms suffer from high resource usage and long latencies. We…

数据库 · 计算机科学 2025-11-25 Md Mizanur Rahaman Nayan , Zheyu Li , Flavio Ponzina , Sumukh Pinge , Tajana Rosing , Azad J. Naeemi

About the Multi-Head Linear Restricted Chase Termination

The chase is a ubiquitous algorithm in database theory. However, for existential rules (aka tuple-generating dependencies), its termination is not guaranteed, and even undecidable in general. The problem of termination becomes particularly…

数据库 · 计算机科学 2025-11-25 Lukas Gerlach , Lucas Larroque , Jerzy Marcinkowski , Piotr Ostropolski-Nalewaja

Anomaly Pattern-guided Transaction Bug Testing in Relational Databases

Concurrent transaction processing is a fundamental capability of Relational Database Management Systems (RDBMSs), widely utilized in applications requiring high levels of parallel user interaction, such as banking systems, e-commerce…

数据库 · 计算机科学 2025-11-24 Huicong Xu , Shuang Liu , Xianyu Zhu , Qiyu Zhuang , Wei Lu , Xiaoyong Du

RAG-Driven Data Quality Governance for Enterprise ERP Systems

Enterprise ERP systems managing hundreds of thousands of employee records face critical data quality challenges when human resources departments perform decentralized manual entry across multiple languages. We present an end-to-end pipeline…

数据库 · 计算机科学 2025-11-24 Sedat Bin Vedat , Enes Kutay Yarkan , Meftun Akarsu , Recep Kaan Karaman , Arda Sar , Çağrı Çelikbilek , Savaş Saygılı

[Experiment, Analysis, and Benchmark] Systematic Evaluation of Plan-based Adaptive Query Processing

Unreliable cardinality estimation remains a critical performance bottleneck in database management systems (DBMSs). Adaptive Query Processing (AQP) strategies address this limitation by providing a more robust query execution mechanism.…

数据库 · 计算机科学 2025-11-21 Pei Mu , Anderson Chaves Carniel , Antonio Barbalace , Amir Shaikhha

From Patents to Dataset: Scraping for Oxide Glass Compositions and Properties

In this work, we present web scraping techniques to extract in- formation from patent tables, clean and structure them for future use in predictive machine learning models to develop new glasses. We extracted compositions and three…

数据库 · 计算机科学 2025-11-21 Gustavo Laranja Thomaello , Thomaz Yeiden Busnardo Aguena , Eric Trevelato Costa , Rafael Baságlia Rosante , Thiago Rodrigo Ramos , Daiane Aparecida Zuanetti , Edgar Dutra Zanotto

Benchmarking Table Extraction from Heterogeneous Scientific Extraction Documents

Table Extraction (TE) consists in extracting tables from PDF documents, in a structured format which can be automatically processed. While numerous TE tools exist, the variety of methods and techniques makes it difficult for users to choose…

数据库 · 计算机科学 2025-11-21 Marijan Soric , Cécile Gracianne , Ioana Manolescu , Pierre Senellart

AskDB: An LLM Agent for Natural Language Interaction with Relational Databases

Interacting with relational databases remains challenging for users across different expertise levels, particularly when composing complex analytical queries or performing administrative tasks. Existing systems typically address either…

数据库 · 计算机科学 2025-11-21 Xuan-Quang Phan , Tan-Ha Mai , Thai-Duy Dinh , Minh-Thuan Nguyen , Lam-Son Lê

B+ANN: A Fast Billion-Scale Disk-based Nearest-Neighbor Index

Storing and processing of embedding vectors by specialized Vector databases (VDBs) has become the linchpin in building modern AI pipelines. Most current VDBs employ variants of a graph-based ap- proximate nearest-neighbor (ANN) index…

数据库 · 计算机科学 2025-11-20 Selim Furkan Tekin , Rajesh Bordawekar

Castle: Causal Cascade Updates in Relational Databases with Large Language Models

This work introduces Castle, the first framework for schema-only cascade update generation using large language models (LLMs). Despite recent advances in LLMs for Text2SQL code generation, existing approaches focus primarily on SELECT…

数据库 · 计算机科学 2025-11-20 Yongye Su , Yucheng Zhang , Zeru Shi , Bruno Ribeiro , Elisa Bertino

Natural Language Interfaces for Databases: What Do Users Think?

Natural Language Interfaces for Databases (NLIDBs) aim to make database querying accessible by allowing users to ask questions in everyday language rather than using formal SQL queries. Despite significant advancements in translation…

数据库 · 计算机科学 2025-11-19 Panos Ipeirotis , Haotian Zheng

Scalable Enforcement of Fine Grained Access Control Policies in Relational Database Management Systems

The proliferation of smart technologies and evolving privacy regulations such as the GDPR and CPRA has increased the need to manage fine-grained access control (FGAC) policies in database management systems (DBMSs). Existing approaches to…

数据库 · 计算机科学 2025-11-19 Anadi Shakya , Primal Pappachan , David Maier , Roberto Yus , Sharad Mehrotra , Johann-Christoph Freytag