Related papers: Efficient Query Re-optimization with Judicious Sub…
Cost-based query optimizers remain one of the most important components of database management systems for analytic workloads. Though modern optimizers select plans close to optimal performance in the common case, a small number of queries…
Despite of decades of work, query optimizers still make mistakes on "difficult" queries because of bad cardinality estimates, often due to the interaction of multiple predicates and correlations in the data. In this paper, we propose a…
Traditionally, query optimizers have been designed for computer systems that share a common architecture, consisting of a CPU, main memory and disk subsystem. The efficiency of query optimizers and their successful employment relied on the…
As declarative query processing techniques expand in scope --- to the Web, data streams, network routers, and cloud platforms --- there is an increasing need for adaptive query processing techniques that can re-plan in the presence of…
Query Optimization remains an open problem for Big Data Management Systems. Traditional optimizers are cost-based and use statistical estimates of intermediate result cardinalities to assign costs and pick the best plan. However, such…
Traditional query optimizers are designed to be fast and stateless: each query is quickly optimized using approximate statistics, sent off to the execution engine, and promptly forgotten. Recent work on learned query optimization have shown…
Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, are increasingly reliant on recursive queries for data analysis. Yet traditional relational algebra-based query…
Query optimization remains one of the most important and well-studied problems in database systems. However, traditional query optimizers are complex heuristically-driven systems, requiring large amounts of time to tune for a particular…
Query optimization has become a research area where classical algorithms are being challenged by machine learning algorithms. At the same time, recent trends in learned query optimizers have shown that it is prudent to take advantage of…
Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…
Automatic query reformulation refers to rewriting a user's original query in order to improve the ranking of retrieval results compared to the original query. We present a general framework for automatic query reformulation based on…
As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional…
Most recently, researchers have started building large language models (LLMs) powered data systems that allow users to analyze unstructured text documents like working with a database because LLMs are very effective in extracting attributes…
We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing…
Evaluating query predicates on data samples is the only way to estimate their selectivity in certain scenarios. Finding a guaranteed optimal query plan is not a reasonable optimization goal in those cases as it might require an infinite…
The query optimizer is a fundamental component of database management systems. Recent studies have shown that learned query optimizers outperform traditional cost-based query optimizers. However, they fail to exploit valuable runtime…
Analytics database workloads often contain queries that are executed repeatedly. Existing optimization techniques generally prioritize keeping optimization cost low, normally well below the time it takes to execute a single instance of a…
Join order selection plays a significant role in query performance. However, modern query optimizers typically employ static join enumeration algorithms that do not receive any feedback about the quality of the resulting plan. Hence,…
Minimizing intermediate results is critical for efficient multi-join query processing. Although the seminal Yannakakis algorithm offers strong guarantees for acyclic queries, cyclic queries remain an open challenge. In this paper, we…
Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a…