Related papers: Instance Optimal Join Size Estimation

Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems

Worst-case optimal join algorithms are the class of join algorithms whose runtime match the worst-case output size of a given join query. While the first provably worst-case optimal join algorithm was discovered relatively recently, the…

Databases · Computer Science 2018-06-27 Hung Q. Ngo

Optimal Joins using Compact Data Structures

Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice.…

Databases · Computer Science 2020-01-10 Gonzalo Navarro , Juan L. Reutter , Javiel Rojas-Ledesma

Worst-case Optimal Join Algorithms

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries…

Databases · Computer Science 2012-03-12 Hung Q. Ngo , Ely Porat , Christopher Ré , Atri Rudra

We study the problem of similarity self-join and similarity join size estimation in a streaming setting where the goal is to estimate, in one scan of the input and with sublinear space in the input size, the number of record pairs that have…

Databases · Computer Science 2020-05-11 Davood Rafiei , Fan Deng

Subset-Based Instance Optimality in Private Estimation

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm…

Machine Learning · Computer Science 2024-05-30 Travis Dick , Alex Kulesza , Ziteng Sun , Ananda Theertha Suresh

Fast Join Project Query Evaluation using Matrix Multiplication

In the last few years, much effort has been devoted to developing join algorithms in order to achieve worst-case optimality for join queries over relational databases. Towards this end, the database community has had considerable success in…

Databases · Computer Science 2020-03-02 Shaleen Deep , Xiao Hu , Paraschos Koutris

Optimal Join Algorithms Meet Top-k

Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality…

Databases · Computer Science 2020-05-04 Nikolaos Tziavelis , Wolfgang Gatterbauer , Mirek Riedewald

Instance-Optimality in I/O-Efficient Sampling and Sequential Estimation

Suppose we have a memory storing $0$s and $1$s and we want to estimate the frequency of $1$s by sampling. We want to do this I/O-efficiently, exploiting that each read gives a block of $B$ bits at unit cost; not just one bit. If the input…

Data Structures and Algorithms · Computer Science 2024-10-21 Shyam Narayanan , Václav Rozhoň , Jakub Tětek , Mikkel Thorup

A Simple Algorithm for Worst-Case Optimal Join and Sampling

We present an elementary branch and bound algorithm with a simple analysis of why it achieves worstcase optimality for join queries on classes of databases defined respectively by cardinality or acyclic degree constraints. We then show that…

Databases · Computer Science 2024-09-24 Florent Capelli , Oliver Irwin , Sylvain Salvati

Instance and Output Optimal Parallel Algorithms for Acyclic Joins

Massively parallel join algorithms have received much attention in recent years, while most prior work has focused on worst-optimal algorithms. However, the worst-case optimality of these join algorithms relies on hard instances having very…

Databases · Computer Science 2019-04-01 Xiao Hu , Ke Yi

Towards Efficient Random-Order Enumeration for Join Queries

In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful…

Databases · Computer Science 2025-07-02 Pengyu Chen , Zizheng Guo , Jianwei Yang , Dongjing Miao

Guaranteeing the \~O(AGM/OUT) Runtime for Uniform Sampling and OUT Size Estimation over Joins

We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q,…

Databases · Computer Science 2023-04-11 Kyoungmin Kim , Jaehyun Ha , George Fletcher , Wook-Shin Han

Solvable Integration Problems and Optimal Sample Size Selection

We compute the integral of a function or the expectation of a random variable with minimal cost and use, for our new algorithm and for upper bounds of the complexity, i.i.d. samples. Under certain assumptions it is possible to select a…

Numerical Analysis · Mathematics 2018-10-24 Robert J. Kunsch , Erich Novak , Daniel Rudolf

On the Fair Comparison of Optimization Algorithms in Different Machines

An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the…

Performance · Computer Science 2024-02-09 Etor Arza , Josu Ceberio , Ekhiñe Irurozki , Aritz Pérez

Selectivity Estimation of Inequality Joins In Databases

Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation, based on which the optimizer can select the cheapest plan to execute. While…

Databases · Computer Science 2022-06-16 Diogo Repas , Zhicheng Luo , Maxime Schoemans , Mahmoud Sakr

Sample size estimation for power and accuracy in the experimental comparison of algorithms

Experimental comparisons of performance represent an important aspect of research on optimization algorithms. In this work we present a methodology for defining the required sample sizes for designing experiments with desired statistical…

Neural and Evolutionary Computing · Computer Science 2018-10-16 Felipe Campelo , Fernanda Takahashi

Multi-Agent Join

It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results…

Databases · Computer Science 2023-12-25 Vahid Ghadakchi , Mian Xie , Arash Termehchy , Bakhtiyar Doskenov , Bharghav Srikhakollu , Summit Haque , Huazheng Wang

Scheduling With Inexact Job Sizes: The Merits of Shortest Processing Time First

It is well known that size-based scheduling policies, which take into account job size (i.e., the time it takes to run them), can perform very desirably in terms of both response time and fairness. Unfortunately, the requirement of knowing…

Performance · Computer Science 2019-07-11 Matteo Dell'Amico

Parallel solutions for ordinal scheduling with a small number of machines

We study ordinal makespan scheduling on small numbers of identical machines, with respect to two parallel solutions. In ordinal scheduling, it is known that jobs are sorted by non-increasing sizes, but the specific sizes are not known in…

Data Structures and Algorithms · Computer Science 2022-10-17 Leah Epstein

Worst-case Optimal Binary Join Algorithms under General $\ell_p$ Constraints

Worst-case optimal join algorithms have so far been studied in two broad contexts -- $(1)$ when we are given input relation sizes [Atserias et al., FOCS 2008, Ngo et al., PODS 2012, Velduizhen et. al, ICDT 2014] $(2)$ when in addition to…

Databases · Computer Science 2021-12-03 Sai Vikneshwar Mani Jayaraman , Corey Ropell , Atri Rudra