Related papers: Bitvector-aware Query Optimization for Decision Su…
One of the central problems in the design of compressed data structures is the efficient support for rank and select queries on bitvectors. These two operations form the backbone of more complex data structures (such as wavelet trees) used…
As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional…
Bloom filters are used in query processing to perform early data reduction and improve query performance. The optimal query plan may be different when Bloom filters are used, indicating the need for Bloom filter-aware query optimization. To…
We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing…
Query optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and…
As declarative query processing techniques expand in scope --- to the Web, data streams, network routers, and cloud platforms --- there is an increasing need for adaptive query processing techniques that can re-plan in the presence of…
Join ordering is the NP-hard problem of selecting the most efficient order in which to evaluate joins (conjunctive, binary operators) in a database query. Because query execution performance critically depends on this choice, join ordering…
Query optimization remains one of the most important and well-studied problems in database systems. However, traditional query optimizers are complex heuristically-driven systems, requiring large amounts of time to tune for a particular…
Query processing in search engines can be optimized for use for all queries. For this, system component parameters such as the weighting function or the automatic query expansion model can be optimized or learned from past queries. However,…
Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query…
Recent work in database query optimization has used complex machine learning strategies, such as customized reinforcement learning schemes. Surprisingly, we show that LLM embeddings of query text contain useful semantic information for…
With the advent of big data applications, which tends to have longer execution time, choosing the right cloud VM to run these applications has significant performance as well as economic implications. For example, in our large-scale…
Vector set search, an underexplored similarity search paradigm, aims to find vector sets similar to a query set. This search paradigm leverages the inherent structural alignment between sets and real-world entities to model more…
Algorithms that exploit sort orders are widely used to implement joins, grouping, duplicate elimination and other set operations. Query optimizers traditionally deal with sort orders by using the notion of interesting orders. The number of…
Inspired by the classical fractional cascading technique, we introduce new techniques to speed up the following type of iterated search in 3D: The input is a graph $\mathbf{G}$ with bounded degree together with a set $H_v$ of 3D hyperplanes…
Filtered ANN search is an increasingly important problem in vector retrieval, yet systems face a difficult trade-off due to the execution order: Pre-filtering (filtering first, then ANN over the passing subset) requires expensive…
Structural decomposition methods offer powerful theoretical guarantees for join evaluation, yet they are rarely used in real-world query optimizers. A major reason is the difficulty of combining cost-based plan search and structure-based…
Set queries are fundamental operations in computer systems and applications.This paper addresses the fundamental problem of designing a probabilistic data structure that can quickly process set queries using a small amount of memory. We…
Cost-based query optimizers remain one of the most important components of database management systems for analytic workloads. Though modern optimizers select plans close to optimal performance in the common case, a small number of queries…
Answering complex logical queries on incomplete knowledge graphs is a challenging task, and has been widely studied. Embedding-based methods require training on complex queries, and cannot generalize well to out-of-distribution query…