Related papers: GYM: A Multiround Join Algorithm In MapReduce
Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to…
In this paper, we investigate the problem of computing a multiway join in one round of MapReduce when the data may be skewed. We optimize on communication cost, i.e., the amount of data that is transferred from the mappers to the reducers.…
We study the problem of computing conjunctive queries over large databases on parallel architectures without shared storage. Using the structure of such a query $q$ and the skew in the data, we study tradeoffs between the number of…
We optimize multiway equijoins on relational tables using degree information. We give a new bound that uses degree information to more tightly bound the maximum output size of a query. On real data, our bound on the number of triangles in a…
We are presented with a graph, $G$, on $n$ vertices with $m$ edges whose edge set is unknown. Our goal is to learn the edges of $G$ with as few queries to an oracle as possible. When we submit a set $S$ of vertices to the oracle, it tells…
The high cost of communicating gradients is a major bottleneck for federated learning, as the bandwidth of the participating user devices is limited. Existing gradient compression algorithms are mainly designed for data centers with…
This paper investigates the energy complexity of distributed graph problems in multi-hop radio networks, where the energy cost of an algorithm is measured by the maximum number of awake rounds of a vertex. Recent works revealed that some…
We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q,…
We study three-way joins on MapReduce. Joins are very useful in a multitude of applications from data integration and traversing social networks, to mining graphs and automata-based constructions. However, joins are expensive, even for…
Given a large graph G = (V,E) with millions of nodes and edges, how do we compute its connected components efficiently? Recent work addresses this problem in map-reduce, where a fundamental trade-off exists between the number of map-reduce…
The computation of the diameter is one of the most central problems in distributed computation. In the standard CONGEST model, in which two adjacent nodes can exchange $O(\log n)$ bits per round (here $n$ denotes the number of nodes of the…
In this paper, we study systems of distributed entities that can actively modify their communication network. This gives rise to distributed algorithms that apart from communication can also exploit network reconfiguration in order to carry…
Can Grover's algorithm speed up search of a physical region - for example a 2-D grid of size sqrt(n) by sqrt(n)? The problem is that sqrt(n) time seems to be needed for each query, just to move amplitude across the grid. Here we show that…
We initiate the study of diameter computation in geometric intersection graphs from the fine-grained complexity perspective. A geometric intersection graph is a graph whose vertices correspond to some shapes in $d$-dimensional Euclidean…
In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful…
The congested clique model is a message-passing model of distributed computation where the underlying communication network is the complete graph of $n$ nodes. In this paper we consider the situation where the joint input to the nodes is an…
We consider the problem of finding a maximal independent set (MIS) in the shared blackboard communication model with vertex-partitioned inputs. There are $n$ players corresponding to vertices of an undirected graph, and each player sees the…
We show how to construct an overlay network of constant degree and diameter $O(\log n)$ in time $O(\log n)$ starting from an arbitrary weakly connected graph. We assume a synchronous communication network in which nodes can send messages to…
In the semi-streaming model for processing massive graphs, an algorithm makes multiple passes over the edges of a given $n$-vertex graph and is tasked with computing the solution to a problem using $O(n \cdot \text{polylog}(n))$ space.…
The problems of computing eccentricity, radius, and diameter are fundamental to graph theory. These parameters are intrinsically defined based on the distance metric of the graph. In this work, we propose quantum algorithms for the diameter…