数据库 — Scifaro

Compressing integer lists with Contextual Arithmetic Trits

Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…

数据库 · 计算机科学 2025-05-06 Yann Barsamian , André Chailloux

Visual Analytics Challenges and Trends in the Age of AI: The BigVis Community Perspective

This report provides insights into the challenges, emerging topics, and opportunities related to human-data interaction and visual analytics in the AI era. The BigVis 2024 organizing committee conducted a survey among experts in the field.…

数据库 · 计算机科学 2025-05-01 Nikos Bikakis , Panos K. Chrysanthis , Guoliang Li , George Papastefanatos , Lingyun Yu

Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index

Natural language (NL)-driven table discovery identifies relevant tables from large table repositories based on NL queries. While current deep-learning-based methods using the traditional dense vector search pipeline, i.e.,…

数据库 · 计算机科学 2025-05-01 Yuxiang Guo , Zhonghao Hu , Yuren Mao , Baihua Zheng , Yunjun Gao , Mingwei Zhou

On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks

Large language models (LLMs) have shown to be valuable tools for tackling process mining tasks. Existing studies report on their capability to support various data-driven process analyses and even, to some extent, that they are able to…

数据库 · 计算机科学 2025-05-01 Adrian Rebmann , Fabian David Schmidt , Goran Glavaš , Han van der Aa

Color: A Framework for Applying Graph Coloring to Subgraph Cardinality Estimation

Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality…

数据库 · 计算机科学 2025-05-01 Kyle Deeds , Diandre Sabale , Moe Kayali , Dan Suciu

Towards FAIR and federated Data Ecosystems for interdisciplinary Research

Scientific data management is at a critical juncture, driven by exponential data growth, increasing cross-domain dependencies, and a severe reproducibility crisis in modern research. Traditional centralized data management approaches are…

数据库 · 计算机科学 2025-04-30 Sebastian Beyvers , Jannis Hochmuth , Lukas Brehm , Maria Hansen , Alexander Goesmann , Frank Förster

Synthesizing Scoring Functions for Rankings Using Symbolic Gradient Descent

Given a relation and a ranking of its tuples, but no information about the ranking function, we are interested in synthesizing simple scoring functions that reproduce the ranking. Our system RankHow identifies linear scoring functions that…

数据库 · 计算机科学 2025-04-30 Zixuan Chen , Panagiotis Manolios , Mirek Riedewald

Cost-based Selection of Provenance Sketches for Data Skipping

Provenance sketches, light-weight indexes that record what data is needed (is relevant) for answering a query, can significantly improve performance of important classes of queries (e.g., HAVING and top-k queries). Given a horizontal…

数据库 · 计算机科学 2025-04-29 Ziyu Liu , Boris Glavic

Representing and querying data tensors in RDF and SPARQL

Embedding tensors in databases has recently gained in significance, due to the rapid proliferation of machine learning methods (including LLMs) which produce embeddings in the form of tensors. To support emerging use cases hybridizing…

数据库 · 计算机科学 2025-04-29 Piotr Marciniak , Piotr Sowinski , Maria Ganzha

BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning

Most large enterprises build predefined data pipelines and execute them periodically to process operational data using SQL queries for various tasks. A key issue in minimizing the overall makespan of these pipelines is the efficient…

数据库 · 计算机科学 2025-04-29 Chenhao Xu , Chunyu Chen , Jinglin Peng , Jiannan Wang , Jun Gao

Beyond Performance: Measuring the Environmental Impact of Analytical Databases

The exponential growth of data is making query processing increasingly critical for modern computing infrastructure, yet the environmental impact of database operations remains poorly understood and largely overlooked. This paper presents…

数据库 · 计算机科学 2025-04-29 Michail Bachras , Hans-Arno Jacobsen

FAIR 2.0: Extending the FAIR Guiding Principles to Address Semantic Interoperability

FAIR data presupposes their successful communication between machines and humans while preserving their meaning and reference, requiring all parties involved to share the same background knowledge. Inspired by English as a natural language,…

数据库 · 计算机科学 2025-04-29 Lars Vogt , Philip Strömert , Nicolas Matentzoglu , Naouel Karam , Marcel Konrad , Manuel Prinz , Roman Baum

Online Marketplace: A Benchmark for Data Management in Microservices

Microservice architectures have become a popular approach for designing scalable distributed applications. Despite their extensive use in industrial settings for over a decade, there is limited understanding of the data management…

数据库 · 计算机科学 2025-04-29 Rodrigo Laigner , Zhexiang Zhang , Yijian Liu , Leonardo Freitas Gomes , Yongluan Zhou

From Randomized Response to Randomized Index: Answering Subset Counting Queries with Local Differential Privacy

Local Differential Privacy (LDP) is the predominant privacy model for safeguarding individual data privacy. Existing perturbation mechanisms typically require perturbing the original values to ensure acceptable privacy, which inevitably…

数据库 · 计算机科学 2025-04-25 Qingqing Ye , Liantong Yu , Kai Huang , Xiaokui Xiao , Weiran Liu , Haibo Hu

Storing and Querying Evolving Graphs in NoSQL Storage Models

This paper investigates advanced storage models for evolving graphs, focusing on the efficient management of historical data and the optimization of global query performance. Evolving graphs, which represent dynamic relationships between…

数据库 · 计算机科学 2025-04-25 Alexandros Spitalas , Anastasios Gounaris , Andreas Kosmatopoulos , Kostas Tsichlas

Evaluating Learned Query Performance Prediction Models at LinkedIn: Challenges, Opportunities, and Findings

Recent advancements in learning-based query performance prediction models have demonstrated remarkable efficacy. However, these models are predominantly validated using synthetic datasets focused on cardinality or latency estimations. This…

数据库 · 计算机科学 2025-04-25 Chujun Song , Slim Bouguerra , Erik Krogen , Daniel Abadi

How to Grow an LSM-tree? Towards Bridging the Gap Between Theory and Practice

LSM-tree based key-value stores are widely adopted as the data storage backend in modern big data applications. The LSM-tree grows with data ingestion, by either adding levels with fixed level capacities (dubbed as vertical scheme) or…

数据库 · 计算机科学 2025-04-25 Dingheng Mo , Siqiang Luo , Stratos Idreos

Transactional Cloud Applications: Status Quo, Challenges, and Opportunities

Transactional cloud applications such as payment, booking, reservation systems, and complex business workflows are currently being rewritten for deployment in the cloud. This migration to the cloud is happening mainly for reasons of cost…

数据库 · 计算机科学 2025-04-25 Rodrigo Laigner , George Christodoulou , Kyriakos Psarakis , Asterios Katsifodimos , Yongluan Zhou

Rel: A Programming Language for Relational Data

From the moment of their inception, languages for relational data have been described as sublanguages embedded in a host programming language. Rel is a new relational language whose key design goal is to go beyond this paradigm with…

数据库 · 计算机科学 2025-04-25 Molham Aref , Paolo Guagliardo , George Kastrinis , Leonid Libkin , Victor Marsault , Wim Martens , Mary McGrath , Filip Murlak , Nathaniel Nystrom , Liat Peterfreund , Allison Rogers , Cristina Sirangelo , Domagoj Vrgoc , David Zhao , Abdul Zreika

HotStuff-1: Linear Consensus with One-Phase Speculation

This paper introduces HotStuff-1, a BFT consensus protocol that improves the latency of HotStuff-2 by two network hops while maintaining linear communication complexity against faults. Furthermore, HotStuff-1 incorporates an…

数据库 · 计算机科学 2025-04-25 Dakai Kang , Suyash Gupta , Dahlia Malkhi , Mohammad Sadoghi