Related papers: A Simple Algorithm for Worst-Case Optimal Join and…
Worst-case optimal join algorithms are the class of join algorithms whose runtime match the worst-case output size of a given join query. While the first provably worst-case optimal join algorithm was discovered relatively recently, the…
Uniform sampling and approximate counting are fundamental primitives for modern database applications, ranging from query optimization to approximate query processing. While recent breakthroughs have established optimal sampling and…
Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality…
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries…
Worst-case optimal join algorithms have so far been studied in two broad contexts -- $(1)$ when we are given input relation sizes [Atserias et al., FOCS 2008, Ngo et al., PODS 2012, Velduizhen et. al, ICDT 2014] $(2)$ when in addition to…
In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful…
In the last few years, much effort has been devoted to developing join algorithms in order to achieve worst-case optimality for join queries over relational databases. Towards this end, the database community has had considerable success in…
In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the…
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice.…
Evaluating the relational join is one of the central algorithmic and most well-studied problems in database systems. A staggering number of variants have been considered including Block-Nested loop join, Hash-Join, Grace, Sort-merge for…
We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q,…
The performance of worst-case optimal join algorithms depends on the order in which the join attributes are processed. Selecting good orders before query execution is hard, due to the large space of possible orders and unreliable execution…
Given a join with an acyclic set of degree constraints, we show how to draw a uniformly random sample from the join result in $O(\mathit{polymat}/ \max \{1, \mathrm{OUT} \})$ expected time after a preprocessing of $O(\mathrm{IN})$ expected…
We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes…
We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing…
We study the hardness of Approximate Query Processing (AQP) of various types of queries involving joins over multiple tables of possibly different sizes. In the case where the query result is a single value (e.g., COUNT, SUM, and…
We optimize multiway equijoins on relational tables using degree information. We give a new bound that uses degree information to more tightly bound the maximum output size of a query. On real data, our bound on the number of triangles in a…
We introduce the problem of Poisson sampling over joins: compute a sample of the result of a join query by conceptually performing a Bernoulli trial for each join tuple, using a non-uniform and tuple-specific probability. We propose an…
Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is…
One of the most celebrated results of computing join-aggregate queries defined over commutative semi-rings is the classic Yannakakis algorithm proposed in 1981. It is known that the runtime of the Yannakakis algorithm is $O(N + \OUT)$ for…