数据库 — Scifaro

Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL

Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks. In particular, improvements in reasoning abilities and the expansion of context windows have opened new avenues for…

数据库 · 计算机科学 2025-06-12 Yeounoh Chung , Gaurav T. Kakkar , Yu Gan , Brenton Milne , Fatma Ozcan

Evaluating Learned Indexes in LSM-tree Systems: Benchmarks,Insights and Design Choices

LSM-tree-based data stores are widely used in industry due to their exceptional performance. However, as data volumes grow, efficiently querying large-scale databases becomes increasingly challenging. To address this, recent studies…

数据库 · 计算机科学 2025-06-11 Junfeng Liu , Jiarui Ye , Mengshi Chen , Meng Li , Siqiang Luo

The Temporal Vadalog System: Temporal Datalog-based Reasoning

In the wake of the recent resurgence of the Datalog language of databases, together with its extensions for ontological reasoning settings, this work aims to bridge the gap between the theoretical studies of DatalogMTL (Datalog extended…

数据库 · 计算机科学 2025-06-11 Luigi Bellomarini , Livia Blasi , Markus Nissl , Emanuel Sallinger

BVLSM: Write-Efficient LSM-Tree Storage via WAL-Time Key-Value Separation

Modern data-intensive applications increasingly store and process big-value items, such as multimedia objects and machine learning embeddings, which exacerbate storage inefficiencies in Log-Structured Merge-Tree (LSM)-based key-value…

数据库 · 计算机科学 2025-06-10 Ming Li , Wendi Cheng , Jiahe Wei , Xueqiang Shan , Weikai Liu , Xiaonan Zhao , Xiao Zhang

SIFBench: An Extensive Benchmark for Fatigue Analysis

Fatigue-induced crack growth is a leading cause of structural failure across critical industries such as aerospace, civil engineering, automotive, and energy. Accurate prediction of stress intensity factors (SIFs) -- the key parameters…

数据库 · 计算机科学 2025-06-10 Tushar Gautam , Robert M. Kirby , Jacob Hochhalter , Shandian Zhe

mobilityDCAT-AP: a Metadata Specification for Enhanced Cross-border Mobility Data Sharing

Integrated and efficient mobility requires data sharing among the involved stakeholders. In this direction, regulators and transport authorities have been defining policies to foster the digitalisation and online publication of mobility…

数据库 · 计算机科学 2025-06-10 Mario Scrocca , Lina Molinas Comet , Benjamin Witsch , Daham Mohammed Mustafa , Christoph Lange , Marco Comerio , Peter Lubrich

Can the Rookies Cut the Tough Cookie? Exploring the Use of LLMs for SQL Equivalence Checking

Equivalence checking of SQL queries is an intractable problem often encountered in settings ranging from grading SQL submissions to debugging query optimizers. Despite recent work toward developing practical solutions, only simple queries…

数据库 · 计算机科学 2025-06-10 Rajat Singh , Srikanta Bedathur

PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data Compression

Exponential growth in data collection is creating significant challenges for data storage and analytics latency.Approximate Query Processing (AQP) has long been touted as a solution for accelerating analytics on large datasets, however,…

数据库 · 计算机科学 2025-06-10 Aaron Hurst , Daniel E. Lucani , Qi Zhang

Stream DaQ: Stream-First Data Quality Monitoring

Data quality is fundamental to modern data science workflows, where data continuously flows as unbounded streams feeding critical downstream tasks, from elementary analytics to advanced artificial intelligence models. Existing data quality…

数据库 · 计算机科学 2025-06-09 Vasileios Papastergios , Anastasios Gounaris

PandasBench: A Benchmark for the Pandas API

The Pandas API has been central to the success of pandas and its alternatives. Despite its importance, there is no benchmark for it, and we argue that we cannot repurpose existing benchmarks (from other domains) for the Pandas API. In this…

数据库 · 计算机科学 2025-06-09 Alex Broihier , Stefanos Baziotis , Daniel Kang , Charith Mendis

Rethinking OWL Expressivity: Semantic Units for FAIR and Cognitively Interoperable Knowledge Graphs Why OWLs don't have to understand everything they say

Semantic knowledge graphs are foundational to implementing the FAIR Principles, yet RDF/OWL representations often lack the semantic flexibility and cognitive interoperability required in scientific domains. We present a novel framework for…

数据库 · 计算机科学 2025-06-09 Lars Vogt

PathFinder: A unified approach for handling paths in graph query languages

Path queries are a core feature of modern graph query languages such as Cypher, SQL/PGQ, and GQL. These languages provide a rich set of features for matching paths, such as restricting to certain path modes (shortest, simple, trail) and…

数据库 · 计算机科学 2025-06-09 Benjamín Farías , Wim Martens , Carlos Rojas , Domagoj Vrgoč

Memory Hierarchy Design for Caching Middleware in the Age of NVM

Advances in storage technology have introduced Non-Volatile Memory, NVM, as a new storage medium. NVM, along with Dynamic Random Access Memory (DRAM), Solid State Disk (SSD), and Disk present a system designer with a wide array of options…

数据库 · 计算机科学 2025-06-06 Shahram Ghandeharizadeh , Sandy Irani , Jenny Lam

Computationally Intensive Research: Advancing a Role for Secondary Analysis of Qualitative Data

This paper draws attention to the potential of computational methods in reworking data generated in past qualitative studies. While qualitative inquiries often produce rich data through rigorous and resource-intensive processes, much of…

数据库 · 计算机科学 2025-06-06 Kaveh Mohajeri , Amir Karami

More Bang For Your Buck(et): Fast and Space-efficient Hardware-accelerated Coarse-granular Indexing on GPUs

In recent work, we have shown that NVIDIA's raytracing cores on RTX video cards can be exploited to realize hardware-accelerated lookups for GPU-resident database indexes. On a high level, the concept materializes all keys as triangles in a…

数据库 · 计算机科学 2025-06-06 Justus Henneberg , Felix Schuhknecht , Rosina Kharal , Trevor Brown

TransClean: Finding False Positives in Multi-Source Entity Matching under Real-World Conditions via Transitive Consistency

We present TransClean, a method for detecting false positive predictions of entity matching algorithms under real-world conditions characterized by large-scale, noisy, and unlabeled multi-source datasets that undergo distributional shifts.…

数据库 · 计算机科学 2025-06-05 Fernando de Meer Pardo , Branka Hadji Misheva , Martin Braschler , Kurt Stockinger

Raster Interval Object Approximations for Spatial Intersection Joins

Spatial join processing techniques that identify intersections between complex geometries (e.g., polygons) commonly follow a two-step filter-and-refine pipeline. The filter step evaluates the query predicate on the minimum bounding…

数据库 · 计算机科学 2025-06-05 Thanasis Georgiadis , Eleni Tzirita Zacharatou , Nikos Mamoulis

A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads

Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine…

数据库 · 计算机科学 2025-06-04 András Strausz , Niels Pardon , Ioana Giurgiu

In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration

Entity Resolution (ER) is a fundamental data quality improvement task that identifies and links records referring to the same real-world entity. Traditional ER approaches often rely on pairwise comparisons, which can be costly in terms of…

数据库 · 计算机科学 2025-06-04 Jiajie Fu , Haitong Tang , Arijit Khan , Sharad Mehrotra , Xiangyu Ke , Yunjun Gao

Retrieval-Augmented Generation of Ontologies from Relational Databases

Transforming relational databases into knowledge graphs with enriched ontologies enhances semantic interoperability and unlocks advanced graph-based learning and reasoning over data. However, previous approaches either demand significant…

数据库 · 计算机科学 2025-06-03 Mojtaba Nayyeri , Athish A Yogi , Nadeen Fathallah , Ratan Bahadur Thapa , Hans-Michael Tautenhahn , Anton Schnurpel , Steffen Staab