数据库
Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…
This report provides insights into the challenges, emerging topics, and opportunities related to human-data interaction and visual analytics in the AI era. The BigVis 2024 organizing committee conducted a survey among experts in the field.…
Natural language (NL)-driven table discovery identifies relevant tables from large table repositories based on NL queries. While current deep-learning-based methods using the traditional dense vector search pipeline, i.e.,…
Large language models (LLMs) have shown to be valuable tools for tackling process mining tasks. Existing studies report on their capability to support various data-driven process analyses and even, to some extent, that they are able to…
Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality…
Scientific data management is at a critical juncture, driven by exponential data growth, increasing cross-domain dependencies, and a severe reproducibility crisis in modern research. Traditional centralized data management approaches are…
Given a relation and a ranking of its tuples, but no information about the ranking function, we are interested in synthesizing simple scoring functions that reproduce the ranking. Our system RankHow identifies linear scoring functions that…
Provenance sketches, light-weight indexes that record what data is needed (is relevant) for answering a query, can significantly improve performance of important classes of queries (e.g., HAVING and top-k queries). Given a horizontal…
Embedding tensors in databases has recently gained in significance, due to the rapid proliferation of machine learning methods (including LLMs) which produce embeddings in the form of tensors. To support emerging use cases hybridizing…
Most large enterprises build predefined data pipelines and execute them periodically to process operational data using SQL queries for various tasks. A key issue in minimizing the overall makespan of these pipelines is the efficient…
The exponential growth of data is making query processing increasingly critical for modern computing infrastructure, yet the environmental impact of database operations remains poorly understood and largely overlooked. This paper presents…
FAIR data presupposes their successful communication between machines and humans while preserving their meaning and reference, requiring all parties involved to share the same background knowledge. Inspired by English as a natural language,…
Microservice architectures have become a popular approach for designing scalable distributed applications. Despite their extensive use in industrial settings for over a decade, there is limited understanding of the data management…
Local Differential Privacy (LDP) is the predominant privacy model for safeguarding individual data privacy. Existing perturbation mechanisms typically require perturbing the original values to ensure acceptable privacy, which inevitably…
This paper investigates advanced storage models for evolving graphs, focusing on the efficient management of historical data and the optimization of global query performance. Evolving graphs, which represent dynamic relationships between…
Recent advancements in learning-based query performance prediction models have demonstrated remarkable efficacy. However, these models are predominantly validated using synthetic datasets focused on cardinality or latency estimations. This…
LSM-tree based key-value stores are widely adopted as the data storage backend in modern big data applications. The LSM-tree grows with data ingestion, by either adding levels with fixed level capacities (dubbed as vertical scheme) or…
Transactional cloud applications such as payment, booking, reservation systems, and complex business workflows are currently being rewritten for deployment in the cloud. This migration to the cloud is happening mainly for reasons of cost…
From the moment of their inception, languages for relational data have been described as sublanguages embedded in a host programming language. Rel is a new relational language whose key design goal is to go beyond this paradigm with…
This paper introduces HotStuff-1, a BFT consensus protocol that improves the latency of HotStuff-2 by two network hops while maintaining linear communication complexity against faults. Furthermore, HotStuff-1 incorporates an…