Related papers: Some Pairs Problems
A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…
A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…
Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…
We consider the following problem : we have a high-resolution street network of a given city, and low-resolution measurements of traffic within this city. We want to associate to each measurement the set of streets corresponding to the…
We present here a more general version of the balanced pair algorithm. This version works in the reducible case and terminates more often than the standard algorithm. We present examples to illustrate this point. Lastly, we discuss the…
Submodular optimization has received significant attention in both practice and theory, as a wide array of problems in machine learning, auction theory, and combinatorial optimization have submodular structure. In practice, these problems…
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…
MapReduce (and its open source implementation Hadoop) has become the de facto platform for processing large data sets. MapReduce offers a streamlined computational framework by interleaving sequential and parallel computation while hiding…
In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation…
The MapReduce framework has been generating a lot of interest in a wide range of areas. It has been widely adopted in industry and has been used to solve a number of non-trivial problems in academia. Putting MapReduce on strong theoretical…
In this paper we propose and prove that cyclic quorum sets can efficiently manage all-pairs computations and data replication. The quorums are O(N/sqrt(P)) in size, up to 50% smaller than the dual N/sqrt(P) array implementations, and…
A common problem in machine learning is to rank a set of n items based on pairwise comparisons. Here ranking refers to partitioning the items into sets of pre-specified sizes according to their scores, which includes identification of the…
We study three-way joins on MapReduce. Joins are very useful in a multitude of applications from data integration and traversing social networks, to mining graphs and automata-based constructions. However, joins are expensive, even for…
Fully pairing all elements of a set while attempting to maximize the total benefit is a combinatorically difficult problem. Such pairing problems naturally appear in various situations in science, technology, economics, and other fields. In…
In this paper we study a worst case to average case reduction for the problem of matrix multiplication over finite fields. Suppose we have an efficient average case algorithm, that given two random matrices $A,B$ outputs a matrix that has a…
This paper considers pairs of optimization problems that are defined from a single input and for which it is desired to find a good approximation to either one of the problems. In many instances, it is possible to efficiently find an…
The paper considers the problem of finding the number of dominant voters in two-level voting procedures. At the first stage, voting is conducted among local groups of voters, and at the second stage, the results are aggregated to form a…
This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by…
The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…
In this paper, we investigate the problem of computing a multiway join in one round of MapReduce when the data may be skewed. We optimize on communication cost, i.e., the amount of data that is transferred from the mappers to the reducers.…