数据库 — Scifaro

Shirakami: A Hybrid Concurrency Control Protocol for Tsurugi Relational Database System

Bill-of-materials and telecommunications billing applications, need to process both short transactions and long read-write transactions simultaneously. Recent work rarely addresses such evolving workloads. To deal with these workloads, we…

数据库 · 计算机科学 2026-03-19 Takayuki Tanabe , Shinichi Umegane , Suguru Arakawa , Ryoji Kurosawa , Takashi Hoshino , Hideyuki Kawashima , Masahiro Tanaka , Takashi Kambayashi

Practical MCTS-based Query Optimization: A Reproducibility Study and new MCTS algorithm for complex queries

Monte Carlo Tree Search (MCTS) has been proposed as a transformative approach to join-order optimization in database query processing, with recent frameworks such as AlphaJoin and HyperQO claiming to outperform traditional methods. However,…

数据库 · 计算机科学 2026-03-18 Vladimir Burlakov , Alena Rybakina , Sergey Kudashev , Konstantin Gilev , Alexander Demin , Denis Ponomaryov , Yuriy Dorn

MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning

Apache Spark SQL is a cornerstone of modern big data analytics.However,optimizing Spark SQL performance is challenging due to its vast configuration space and the prohibitive cost of evaluating massive workloads. Existing tuning methods…

数据库 · 计算机科学 2026-03-18 Beicheng Xu , Lingching Tung , Yuchen Wang , Yupeng Lu , Bin Cui

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems that power multimodal retrieval and…

数据库 · 计算机科学 2026-03-18 Kyoungmin Kim , Lennart Roth , Liang Liang , Anastasia Ailamaki

Dialect-Agnostic SQL Parsing via LLM-Based Segmentation

SQL is a widely adopted language for querying data, which has led to the development of various SQL analysis and rewriting tools. However, due to the diversity of SQL dialects, such tools often fail when encountering unrecognized…

数据库 · 计算机科学 2026-03-18 Junwen An , Kabilan Mahathevan , Manuel Rigger

Accelerating Approximate Analytical Join Queries over Unstructured Data with Statistical Guarantees

Analytical join queries over unstructured data are increasingly prevalent in data analytics. Applying machine learning (ML) models to label every pair in the cross product of tables can achieve state-of-the-art accuracy, but the cost of…

数据库 · 计算机科学 2026-03-18 Yuxuan Zhu , Tengjun Jin , Chenghao Mo , Daniel Kang

BEACON: Budget-Aware Entity Matching Across Domains (Extended Technical Report)

Entity Matching (EM)--the task of determining whether two data records refer to the same real-world entity--is a core task in data integration. Recent advances in deep learning have set a new standard for EM, particularly through…

数据库 · 计算机科学 2026-03-18 Nicholas Pulsone , Roee Shraga , Gregory Goren

Workload-Aware Incremental Reclustering in Cloud Data Warehouses

Modern cloud data warehouses store data in micro-partitions and rely on metadata (e.g., zonemaps) for efficient data pruning during query processing. Maintaining data clustering in a large-scale table is crucial for effective data pruning.…

数据库 · 计算机科学 2026-03-18 Yipeng Liu , Renfei Zhou , Jiaqi Yan , Huanchen Zhang

Direct Access for Conjunctive Queries with Negations

Given a conjunctive query $Q$ and a database $D$, a direct access to the answers of $Q$ over $D$ is the operation of returning, given an index $k$, the $k$-th answer for some order on its answers. While this problem is $\#\mathcal{P}$-hard…

数据库 · 计算机科学 2026-03-18 Florent Capelli , Nofar Carmeli , Oliver Irwin , Sylvain Salvati

DOT: Dynamic Knob Selection and Online Sampling for Automated Database Tuning

Database Management Systems (DBMS) are crucial for efficient data management and access control, but their administration remains challenging for Database Administrators (DBAs). Tuning, in particular, is known to be difficult. Modern…

数据库 · 计算机科学 2026-03-17 Yifan Wang , Debabrota Basu , Pierre Bourhis , Romain Rouvoy , Patrick Royer

Succinct Structure Representations for Efficient Query Optimization

Structural decomposition methods offer powerful theoretical guarantees for join evaluation, yet they are rarely used in real-world query optimizers. A major reason is the difficulty of combining cost-based plan search and structure-based…

数据库 · 计算机科学 2026-03-17 Zhekai Jiang , Qichen Wang , Christoph Koch

Nova: Scalable Streaming Join Placement and Parallelization in Resource-Constrained Geo-Distributed Environments

Real-time data processing in large geo-distributed applications, like the Internet of Things (IoT), increasingly shifts computation from the cloud to the network edge to reduce latency and mitigate network congestion. In this setting,…

数据库 · 计算机科学 2026-03-17 Xenofon Chatziliadis , Eleni Tzirita Zacharatou , Samira Akili , Alphan Eracar , Volker Markl

Size Bound-Adorned Datalog

We introduce EDB-bounded datalog, a framework for deriving upper bounds on intermediate result sizes and the asymptotic complexity of recursive queries in datalog. We present an algorithm that, given an arbitrary datalog program, constructs…

数据库 · 计算机科学 2026-03-17 Christian Fattebert , Zhekai Jiang , Christoph Koch , Reinhard Pichler , Qichen Wang

Towards Parameterized Hardness on Maintaining Conjunctive Queries

We investigate the fine-grained complexity of dynamically maintaining the result of fixed self-join free conjunctive queries under single-tuple updates. Prior work shows that free-connex queries can be maintained in update time…

数据库 · 计算机科学 2026-03-17 Qichen Wang

Shape-Agnostic Table Overlap Discovery: A Maximum Common Subhypergraph Approach

Understanding how two tables overlap is useful for many data management tasks, but challenging because tables often differ in row and column orders and lack reliable metadata in practice. Prior work defines the largest rectangular overlap,…

数据库 · 计算机科学 2026-03-17 Ge Lee , Shixun Huang , Zhifeng Bao , Felix Naumann , Shazia Sadiq , Yanchang Zhao

Causal Search for Skylines (CSS): Causally-Informed Selective Data De-Correlation

Skyline queries are popular and effective tools in multi-criteria decision support as they extract interesting (pareto-optimal) points that help summarize the available data with respect to a given set of preference attributes.…

数据库 · 计算机科学 2026-03-17 Pratanu Mandal , Abhinav Gorantla , K. Selçuk Candan , Maria Luisa Sapino

Wheel Dynamic Load Estimation Method Based on Gas Pressure of Hydro-pneumatic Suspension

This paper proposes a novel method to estimate the wheel dynamic load based on the gas pressure of a hydro-pneumatic suspension. A nonlinear coupled model between suspension chamber pressure and tire-ground contact force is developed,…

数据库 · 计算机科学 2026-03-17 Qijun Liao , Jue Yang , Subhash Rakheja , Yiting Kang , Yumeng Yao , Yuming Yin

ATCC: Adaptive Concurrency Control for Unforeseen Agentic Transactions

Data agents, empowered by Large Language Models (LLMs), introduce a new paradigm in transaction processing. Unlike traditional applications with fixed patterns, data agents run online-generated workflows that repeatedly issue SQL…

数据库 · 计算机科学 2026-03-17 Weixing Zhou , Zhiyou Wang , Zeshun Peng , Hetian Chen , Yanfeng Zhang , Ge Yu

Concurrency Control as a Service

Existing disaggregated databases separate execution and storage layers, enabling independent and elastic scaling of resources. In most cases, this design makes transaction concurrency control (CC) a critical bottleneck, which demands…

数据库 · 计算机科学 2026-03-17 Weixing Zhou , Yanfeng Zhang , Xinji Zhou , Zhiyou Wang , Zeshun Peng , Yang Ren , Sihao Li , Huanchen Zhang , Guoliang Li , Ge Yu

MICRO: A Lightweight Middleware for Optimizing Cross-store Cross-model Graph-Relation Joins [Technical Report]

Modern data applications increasingly involve heterogeneous data managed in different models and stored across disparate database engines, often deployed as separate installs. Limited research has addressed cross-model query processing in…

数据库 · 计算机科学 2026-03-17 Xiuwen Zheng , Arun Kumar , Amarnath Gupta