Communication Steps for Parallel Query Processing
Abstract
We consider the problem of computing a relational query on a large input database of size , using a large number of servers. The computation is performed in rounds, and each server can receive only bits of data, where is a parameter that controls replication. We examine how many global communication steps are needed to compute . We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires , where is the fractional vertex cover of the hypergraph of . We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent . The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication.
Cite
@article{arxiv.1306.5972,
title = {Communication Steps for Parallel Query Processing},
author = {Paul Beame and Paraschos Koutris and Dan Suciu},
journal= {arXiv preprint arXiv:1306.5972},
year = {2013}
}