English

GYM: A Multiround Join Algorithm In MapReduce

Databases 2017-01-27 v8

Abstract

Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for a spectrum of rounds for the problem of computing the equijoin of nn relations. Specifically, given any query QQ with width \w\w, {\em intersection width} \iw\iw, input size IN\mathrm{IN}, output size OUT\mathrm{OUT}, and a cluster of machines with MM memory available per machine, we show that: (1) QQ can be computed in O(n)O(n) rounds with O(n(IN\w+OUT)2M)O(n\frac{(\mathrm{IN}^{\w} + \mathrm{OUT})^2}{M}) communication cost. (2) QQ can be computed in O(log(n))O(\log(n)) rounds with O(n(INmax(\w,3\iw)+OUT)2M)O(n\frac{(\mathrm{IN}^{\max(\w, 3\iw)} + \mathrm{OUT})^2}{M}) communication cost. \end{itemize} Intersection width is a new notion of queries and generalized hypertree decompositions (GHDs) of queries we introduce to capture how connected the adjacent cyclic components of the GHDs are. We achieve our first result by introducing a distributed and generalized version of Yannakakis's algorithm, called GYM. GYM takes as input any GHD of QQ with width \w\w and depth dd, and computes QQ in O(d+log(n))O(d + \log(n)) rounds and O(n(IN\w+OUT)2M)O(n\frac{(\mathrm{IN}^{\w} + \mathrm{OUT})^2}{M}) communication cost. We achieve our second result by showing how to construct GHDs of QQ with width max(\w,3\iw)\max(\w, 3\iw) and depth O(log(n))O(\log(n)). We describe another technique to construct GHDs with longer widths and shorter depths, demonstrating a spectrum of tradeoffs one can make between communication and the number of rounds.

Keywords

Cite

@article{arxiv.1410.4156,
  title  = {GYM: A Multiround Join Algorithm In MapReduce},
  author = {Foto Afrati and Manas Joglekar and Christopher Ré and Semih Salihoglu and Jeffrey D. Ullman},
  journal= {arXiv preprint arXiv:1410.4156},
  year   = {2017}
}
R2 v1 2026-06-22T06:24:53.617Z