数据库 — Scifaro

Graph-Based Feature Augmentation for Predictive Tasks on Relational Datasets

Data has become a foundational asset driving innovation across domains such as finance, healthcare, and e-commerce. In these areas, predictive modeling over relational tables is commonly employed, with increasing emphasis on reducing manual…

数据库 · 计算机科学 2025-08-29 Lianpeng Qiao , Ziqi Cao , Kaiyu Feng , Ye Yuan , Guoren Wang

Research Challenges in Relational Database Management Systems for LLM Queries

Large language models (LLMs) have become essential for applications such as text summarization, sentiment analysis, and automated question-answering. Recently, LLMs have also been integrated into relational database management systems to…

数据库 · 计算机科学 2025-08-29 Kerem Akillioglu , Anurag Chakraborty , Sairaj Voruganti , M. Tamer Özsu

Bootstrapping Learned Cost Models with Synthetic SQL Queries

Having access to realistic workloads for a given database instance is extremely important to enable stress and vulnerability testing, as well as to optimize for cost and performance. Recent advances in learned cost models have shown that…

数据库 · 计算机科学 2025-08-28 Michael Nidd , Christoph Miksovic , Thomas Gschwind , Francesco Fusco , Andrea Giovannini , Ioana Giurgiu

Robust Recursive Query Parallelism in Graph Database Management Systems

Efficient multi-core parallel processing of recursive join queries is critical for achieving good performance in graph database management systems (GDBMSs). Prior work adopts two broad approaches. First is the state of the art morsel-driven…

数据库 · 计算机科学 2025-08-28 Anurag Chakraborty , Semih Salihoğlu

Enriching Object-Centric Event Data with Process Scopes: A Framework for Aggregation and Analysis

Object-Centric Process Mining enables the analysis of complex operational behavior by capturing interactions among multiple business objects (e.g., orders, items, deliveries). These interactions are recorded using Object-Centric Event Data…

数据库 · 计算机科学 2025-08-27 Shahrzad Khayatbashi , Majid Rafiei , Jiayuan Chen , Timotheus Kampik , Gregor Berg , Amin Jalali

Text to Query Plans for Question Answering on Large Tables

Efficient querying and analysis of large tabular datasets remain significant challenges, especially for users without expertise in programming languages like SQL. Text-to-SQL approaches have shown promising performance on benchmark data;…

数据库 · 计算机科学 2025-08-27 Yipeng Zhang , Chen Wang , Yuzhe Zhang , Jacky Jiang

Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics

Serving Large Language Models (LLMs) at scale requires meeting strict Service Level Objectives (SLOs) under severe computational and memory constraints. Nevertheless, traditional caching strategies fall short: exact-matching and prefix…

数据库 · 计算机科学 2025-08-27 Jungwoo Kim , Minsang Kim , Jaeheon Lee , Chanwoo Moon , Heejin Kim , Taeho Hwang , Woosuk Chung , Yeseong Kim , Sungjin Lee

Optimal $(\alpha,\beta)$-Dense Subgraph Search in Bipartite Graphs

Dense subgraph search in bipartite graphs is a fundamental problem in graph analysis, with wide-ranging applications in fraud detection, recommendation systems, and social network analysis. The recently proposed $(\alpha, \beta)$-dense…

数据库 · 计算机科学 2025-08-27 Yalong Zhang , Rong-Hua Li , Qi Zhang , Guoren Wang

Brook-2PL: Tolerating High Contention Workloads with A Deadlock-Free Two-Phase Locking Protocol

The problem of hotspots remains a critical challenge in high-contention workloads for concurrency control (CC) protocols. Traditional concurrency control approaches encounter significant difficulties under high contention, resulting in…

数据库 · 计算机科学 2025-08-27 Farzad Habibi , Juncheng Fang , Tania Lorido-Botran , Faisal Nawab

Metrics, KPIs, and Taxonomy for Data Valuation and Monetisation -- A Systematic Literature Review

Data valuation and data monetisation are complex subjects but essential to most organisations today. Unfortunately, they still lack standard procedures and frameworks for organisations to follow. In this survey, we introduce the reader to…

数据库 · 计算机科学 2025-08-27 Eduardo Vyhmeister , Bastien Pietropaoli , Alejando Martinez Molina , Montserrat Gonzalez-Ferreiro , Gabriel Gonzalez-Castane , Jordi Arjona Aroca , Andrea Visentin

Accelerating Historical K-Core Search in Temporal Graphs

We study the temporal k-core component search (TCCS), which outputs the k-core containing the query vertex in the snapshot over an arbitrary query time window in a temporal graph. The problem has been shown to be critical for tasks such as…

数据库 · 计算机科学 2025-08-26 Zhuo Ma , Dong Wen , Kaiyu Chen , Yixiang Fang , Xuemin Lin , Wenjie Zhang

Join Cardinality Estimation with OmniSketches

Join ordering is a key factor in query performance, yet traditional cost-based optimizers often produce sub-optimal plans due to inaccurate cardinality estimates in multi-predicate, multi-join queries. Existing alternatives such as…

数据库 · 计算机科学 2025-08-26 David Justen , Matthias Boehm

PGTuner: An Efficient Framework for Automatic and Transferable Configuration Tuning of Proximity Graphs

Approximate Nearest Neighbor Search (ANNS) plays a crucial role in many key areas. Proximity graphs (PGs) are the leading method for ANNS, offering the best balance between query efficiency and accuracy. However, their performance heavily…

数据库 · 计算机科学 2025-08-26 Hao Duan , Yitong Song , Bin Yao , Anqi Liang

TRIM: Accelerating High-Dimensional Vector Similarity Search with Enhanced Triangle-Inequality-Based Pruning

High-dimensional vector similarity search (HVSS) is critical for many data processing and AI applications. However, traditional HVSS methods often require extensive data access for distance calculations, leading to inefficiencies.…

数据库 · 计算机科学 2025-08-26 Yitong Song , Pengcheng Zhang , Chao Gao , Bin Yao , Kai Wang , Zongyuan Wu , Lin Qu

RubikSQL: Lifelong Learning Agentic Knowledge Base as an Industrial NL2SQL System

We present RubikSQL, a novel NL2SQL system designed to address key challenges in real-world enterprise-level NL2SQL, such as implicit intents and domain-specific terminology. RubikSQL frames NL2SQL as a lifelong learning task, demanding…

数据库 · 计算机科学 2025-08-26 Zui Chen , Han Li , Xinhao Zhang , Xiaoyu Chen , Chunyin Dong , Yifeng Wang , Xin Cai , Su Zhang , Ziqi Li , Chi Ding , Jinxu Li , Shuai Wang , Dousheng Zhao , Sanhai Gao , Guangyi Liu

SEFRQO: A Self-Evolving Fine-Tuned RAG-Based Query Optimizer

Query optimization is a crucial problem in database systems that has been studied for decades. Learned query optimizers (LQOs) can improve performance over time by incorporating feedback; however, they suffer from cold-start issues and…

数据库 · 计算机科学 2025-08-26 Hanwen Liu , Qihan Zhang , Ryan Marcus , Ibrahim Sabek

Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations

Tables are a prevalent format for structured data, yet their metadata, such as semantic types and column relationships, is often incomplete or ambiguous. Column annotation tasks, including Column Type Annotation (CTA) and Column Property…

数据库 · 计算机科学 2025-08-26 Zhihao Ding , Yongkang Sun , Jieming Shi

AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark

Data cleaning is a time-consuming and error-prone manual process, even with modern workflow tools such as OpenRefine. We present AutoDCWorkflow, an LLM-based pipeline for automatically generating data-cleaning workflows. The pipeline takes…

数据库 · 计算机科学 2025-08-26 Lan Li , Liri Fang , Bertram Ludäscher , Vetle I. Torvik

A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications

Background. In the last decades, several life science resources have structured data using the same framework and made these accessible using the same query language to facilitate interoperability. Knowledge graphs have seen increased…

数据库 · 计算机科学 2025-08-26 Jerven Bolleman , Vincent Emonet , Adrian Altenhoff , Amos Bairoch , Marie-Claude Blatter , Alan Bridge , Severine Duvaud , Elisabeth Gasteiger , Dmitry Kuznetsov , Sebastien Moretti , Pierre-Andre Michel , Anne Morgat , Marco Pagni , Nicole Redaschi , Monique Zahn-Zabal , Tarcisio Mendes de Farias , Ana Claudia Sima

Combined Approximations for Uniform Operational Consistent Query Answering

Operational consistent query answering (CQA) is a recent framework for CQA based on revised definitions of repairs, which are built by applying a sequence of operations (e.g., fact deletions) starting from an inconsistent database until we…

数据库 · 计算机科学 2025-08-25 Marco Calautti , Ester Livshits , Andreas Pieris , Markus Schneider