Related papers: Computing Marginals Using MapReduce
A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…
A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…
In the All-Reduce problem, each one of the K nodes holds an input and wishes to compute the sum of all K inputs through a communication network where each pair of nodes is connected by a parallel link with arbitrary bandwidth. The…
In order to generate prime implicants for a given cube (minterm), most of minimization methods increase the dimension of this cube by removing one literal from it at a time. But there are two problems of exponential complexity. One of them…
Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular $k$-center variant which, given a set $S$ of points from some metric space and a…
Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming…
This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by…
Submodular optimization has received significant attention in both practice and theory, as a wide array of problems in machine learning, auction theory, and combinatorial optimization have submodular structure. In practice, these problems…
Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular $k$-median and $k$-means variants which, given a set $P$ of points from a metric…
The theme of this paper is how to find all instances of a given "sample" graph in a larger "data graph," using a single round of map-reduce. For the simplest sample graph, the triangle, we improve upon the best known such algorithm. We then…
In this paper we study MapReduce computations from a complexity-theoretic perspective. First, we formulate a uniform version of the MRC model of Karloff et al. (2010). We then show that the class of regular languages, and moreover all of…
Graph problems are troublesome when it comes to MapReduce. Typically, to be able to design algorithms that make use of the advantages of MapReduce, assumptions beyond what the model imposes, such as the density of the input graph, are…
The Massive Parallel Computation (MPC) model is a theoretical framework for popular parallel and distributed platforms such as MapReduce, Hadoop, or Spark. We consider the task of computing a large matching or small vertex cover in this…
A common form of MapReduce application involves discovering relationships between certain pairs of inputs. Similarity joins serve as a good example of this type of problem, which we call a "some-pairs" problem. In the framework of Afrati et…
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…
We explain how the popular, highly abstract MapReduce model of parallel computation (MRC) can be rooted in reality by explaining how it can be simulated on realistic distributed-memory parallel machine models like BSP. We first refine the…
Given a dataset of points in a metric space and an integer $k$, a diversity maximization problem requires determining a subset of $k$ points maximizing some diversity objective measure, e.g., the minimum or the average distance between two…
Regret minimizing sets are a very recent approach to representing a dataset D with a small subset S of representative tuples. The set S is chosen such that executing any top-1 query on S rather than D is minimally perceptible to any user.…
In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the…
In this paper we consider two metric covering/clustering problems - \textit{Minimum Cost Covering Problem} (MCC) and $k$-clustering. In the MCC problem, we are given two point sets $X$ (clients) and $Y$ (servers), and a metric on $X \cup…