数据库 — Scifaro

Towards Cross-Model Efficiency in SQL/PGQ

SQL/PGQ is a new standard that integrates graph querying into relational systems, allowing users to freely switch between graph patterns and SQL. Our experiments show performance gaps between these models, as queries written in both…

数据库 · 计算机科学 2025-05-13 Hadar Rotschield , Liat Peterfreund

TierBase: A Workload-Driven Cost-Optimized Key-Value Store

In the current era of data-intensive applications, the demand for high-performance, cost-effective storage solutions is paramount. This paper introduces a Space-Performance Cost Model for key-value store, designed to guide cost-effective…

数据库 · 计算机科学 2025-05-13 Zhitao Shen , Shiyu Yang , Weibo Chen , Kunming Wang , Yue Li , Jiabao Jin , Wei Jia , Junwei Chen , Yuan Su , Xiaoxia Duan , Wei Chen , Lei Wang , Jie Song , Ruoyi Ruan , Xuemin Lin

Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data

Filtered approximate nearest neighbor search (FANNS), an extension of approximate nearest neighbor search (ANNS) that incorporates scalar filters, has been widely applied to constrained retrieval of vector data. Despite its growing…

数据库 · 计算机科学 2025-05-13 Yanjun Lin , Kai Zhang , Zhenying He , Yinan Jing , X. Sean Wang

Exploring Next Token Prediction For Optimizing Databases

The Next Token Prediction paradigm (NTP, for short) lies at the forefront of modern large foundational models that are pre-trained on diverse and large datasets. These models generalize effectively, and have proven to be very successful in…

数据库 · 计算机科学 2025-05-13 Yeasir Rayhan , Walid G. Aref

Information Theory Strikes Back: New Development in the Theory of Cardinality Estimation

Estimating the cardinality of the output of a query is a fundamental problem in database query processing. In this article, we overview a recently published contribution that casts the cardinality estimation problem as linear optimization…

数据库 · 计算机科学 2025-05-13 Mahmoud Abo Khamis , Vasileios Nakos , Dan Olteanu , Dan Suciu

Dataset Discovery via Line Charts

Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel…

数据库 · 计算机科学 2025-05-13 Daomin Ji , Hui Luo , Zhifeng Bao , J. Shane Culpepper

TODS: An Automated Time Series Outlier Detection System

We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is…

数据库 · 计算机科学 2025-05-13 Kwei-Herng Lai , Daochen Zha , Guanchu Wang , Junjie Xu , Yue Zhao , Devesh Kumar , Yile Chen , Purav Zumkhawaka , Mingyang Wan , Diego Martinez , Xia Hu

An Automated LLM-based Pipeline for Asset-Level Database Creation to Assess Deforestation Impact

The European Union Deforestation Regulation (EUDR) requires companies to prove their products do not contribute to deforestation, creating a critical demand for precise, asset-level environmental impact data. Current databases lack the…

数据库 · 计算机科学 2025-05-12 Avanija Menon , Ovidiu Serban

Budgeted Spatial Data Acquisition: When Coverage and Connectivity Matter

Data is undoubtedly becoming a commodity like oil, land, and labor in the 21st century. Although there have been many successful marketplaces for data trading, the existing data marketplaces lack consideration of the case where buyers want…

数据库 · 计算机科学 2025-05-12 Wenzhe Yang , Shixun Huang , Sheng Wang , Zhiyong Peng

Spatially Disaggregated Energy Consumption and Emissions in End-use Sectors for Germany and Spain

High-resolution energy consumption and emissions datasets are essential for localized policy-making, resource optimization, and climate action planning. They enable municipalities to monitor mitigation strategies and foster engagement among…

数据库 · 计算机科学 2025-05-09 Shruthi Patil , Noah Pflugradt , Jann M. Weinand , Jürgen Kropp , Detlef Stolten

Beyond Relations: A Case for Elevating to the Entity-Relationship Abstraction

Spurred by a number of recent trends, we make the case that the relational database systems should urgently move beyond supporting the basic object-relational model and instead embrace a more abstract data model, specifically, the…

数据库 · 计算机科学 2025-05-07 Amol Deshpande

Including Bloom Filters in Bottom-up Optimization

Bloom filters are used in query processing to perform early data reduction and improve query performance. The optimal query plan may be different when Bloom filters are used, indicating the need for Bloom filter-aware query optimization. To…

数据库 · 计算机科学 2025-05-07 Tim Zeyl , Qi Cheng , Reza Pournaghi , Jason Lam , Weicheng Wang , Calvin Wong , Chong Chen , Per-Ake Larson

Esc: An Early-stopping Checker for Budget-aware Index Tuning

Index tuning is a time-consuming process. One major performance bottleneck in existing index tuning systems is the large amount of "what-if" query optimizer calls that estimate the cost of a given pair of query and index configuration…

数据库 · 计算机科学 2025-05-07 Xiaoying Wang , Wentao Wu , Vivek Narasayya , Surajit Chaudhuri

MARIOH: Multiplicity-Aware Hypergraph Reconstruction

Hypergraphs offer a powerful framework for modeling higher-order interactions that traditional pairwise graphs cannot fully capture. However, practical constraints often lead to their simplification into projected graphs, resulting in…

数据库 · 计算机科学 2025-05-07 Kyuhan Lee , Geon Lee , Kijung Shin

PANDA: Query Evaluation in Submodular Width

In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations…

数据库 · 计算机科学 2025-05-07 Mahmoud Abo Khamis , Hung Q. Ngo , Dan Suciu

Wii: Dynamic Budget Reallocation In Index Tuning

Index tuning aims to find the optimal index configuration for an input workload. It is often a time-consuming and resource-intensive process, largely attributed to the huge amount of "what-if" calls made to the query optimizer during…

数据库 · 计算机科学 2025-05-06 Xiaoying Wang , Wentao Wu , Chi Wang , Vivek Narasayya , Surajit Chaudhuri

Conformal Prediction for Verifiable Learned Query Optimization

Query optimization is critical in relational databases. Recently, numerous Learned Query Optimizers (LQOs) have been proposed, demonstrating superior performance over traditional hand-crafted query optimizers after short training periods.…

数据库 · 计算机科学 2025-05-06 Hanwen Liu , Shashank Giridhara , Ibrahim Sabek

BMTree: Designing, Learning, and Updating Piecewise Space-Filling Curves for Multi-Dimensional Data Indexing

Space-filling curves (SFC, for short) have been widely applied to index multi-dimensional data, which first maps the data to one dimension, and then a one-dimensional indexing method, e.g., the B-tree indexes the mapped data. Existing SFCs…

数据库 · 计算机科学 2025-05-06 Jiangneng Li , Yuang Liu , Zheng Wang , Gao Cong , Cheng Long , Walid G. Aref , Han Mao Kiah , Bin Cui

LogDB: Multivariate Log-based Failure Diagnosis for Distributed Databases (Extended from MultiLog)

Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation,…

数据库 · 计算机科学 2025-05-06 Lingzhe Zhang , Tong Jia , Mengxi Jia , Ying Li

Building Scalable AI-Powered Applications with Cloud Databases: Architectures, Best Practices and Performance Considerations

The rapid adoption of AI-powered applications demands high-performance, scalable, and efficient cloud database solutions, as traditional architectures often struggle with AI-driven workloads requiring real-time data access, vector search,…

数据库 · 计算机科学 2025-05-06 Santosh Bhupathi