Related papers: Communication Cost in Parallel Query Processing
We study the problem of computing a conjunctive query q in parallel, using p of servers, on a large database. We consider algorithms with one round of communication, and study the complexity of the communication. We are especially…
We consider the problem of computing a relational query $q$ on a large input database of size $n$, using a large number $p$ of servers. The computation is performed in rounds, and each server can receive only $O(n/p^{1-\varepsilon})$ bits…
In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the…
We study the problem of computing a full Conjunctive Query in parallel using $p$ heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it…
Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to…
A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over…
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…
A common scenario in distributed computing involves a client who asks a server to perform a computation on a remote computer. An important problem is to determine the minimum amount of communication needed to specify the desired…
We study the scalability of consensus-based distributed optimization algorithms by considering two questions: How many processors should we use for a given problem, and how often should they communicate when communication is not free?…
Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large…
Most of the prior work in massively parallel data processing assumes homogeneity, i.e., every computing unit has the same computational capability, and can communicate with every other unit with the same latency and bandwidth. However, this…
Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were…
In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a $3$-dimensional symmetric tensor. This is a key computation in the higher-order power method for determining eigenpairs of a…
In this paper we study the problem of dynamically maintaining graph properties under batches of edge insertions and deletions in the massively parallel model of computation. In this setting, the graph is stored on a number of machines, each…
Data shuffling between distributed cluster of nodes is one of the critical steps in implementing large-scale learning algorithms. Randomly shuffling the data-set among a cluster of workers allows different nodes to obtain fresh data…
This paper studies the distributed linearly separable computation problem, which is a generalization of many existing distributed computing problems such as distributed gradient descent and distributed linear transform. In this problem, a…
The cost of communication is a substantial factor affecting the scalability of many distributed applications. Every message sent can incur a cost in storage, computation, energy and bandwidth. Consequently, reducing the communication costs…
Information theoretically secure multi-party computation (MPC) is a central primitive of modern cryptography. However, relatively little is known about the communication complexity of this primitive. In this work, we develop powerful…
In the research area of parallel computation, the communication cost has been extensively studied, while the IO cost has been neglected. For big data computation, the assumption that the data fits in main memory no longer holds, and…
Roughgarden, Vassilvitskii, and Wang (JACM 18) recently introduced a novel framework for proving lower bounds for Massively Parallel Computation using techniques from boolean function complexity. We extend their framework in two different…