Related papers: Computing Marginals Using MapReduce

Assignment of Different-Sized Inputs in MapReduce

A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…

Databases · Computer Science 2015-01-28 Foto Afrati , Shlomi Dolev , Ephraim Korach , Shantanu Sharma , Jeffrey D. Ullman

Assignment Problems of Different-Sized Inputs in MapReduce

A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…

Databases · Computer Science 2016-10-21 Foto Afrati , Shlomi Dolev , Ephraim Korach , Shantanu Sharma , Jeffrey D. Ullman

On the Computation Rate of All-Reduce

In the All-Reduce problem, each one of the K nodes holds an input and wishes to compute the sum of all K inputs through a communication network where each pair of nodes is connected by a parallel link with arbitrary bandwidth. The…

Information Theory · Computer Science 2026-02-27 Yufeng Zhou , Hua Sun

A Reduced Offset Based Method for Fast Computation of the Prime Implicants Covering a Given Cube

In order to generate prime implicants for a given cube (minterm), most of minimization methods increase the dimension of this cube by removing one literal from it at a time. But there are two problems of exponential complexity. One of them…

Data Structures and Algorithms · Computer Science 2010-01-12 Fatih Basciftci , Sirzat Kahramanli

Solving $k$-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially

Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular $k$-center variant which, given a set $S$ of points from some metric space and a…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-02 Matteo Ceccarello , Andrea Pietracaprina , Geppino Pucci

Fast Clustering using MapReduce

Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-09 Alina Ene , Sungjin Im , Benjamin Moseley

Space-Round Tradeoffs for MapReduce Computations

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by…

Data Structures and Algorithms · Computer Science 2013-06-13 Andrea Pietracaprina , Geppino Pucci , Matteo Riondato , Francesco Silvestri , Eli Upfal

Submodular Optimization in the MapReduce Model

Submodular optimization has received significant attention in both practice and theory, as a wide array of problems in machine learning, auction theory, and combinatorial optimization have submodular structure. In practice, these problems…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-04 Paul Liu , Jan Vondrak

Accurate MapReduce Algorithms for $k$-median and $k$-means in General Metric Spaces

Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular $k$-median and $k$-means variants which, given a set $P$ of points from a metric…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-01 Alessio Mazzetto , Andrea Pietracaprina , Geppino Pucci

Enumerating Subgraph Instances Using Map-Reduce

The theme of this paper is how to find all instances of a given "sample" graph in a larger "data graph," using a single round of map-reduce. For the simplest sample graph, the triangle, we improve upon the best known such algorithm. We then…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-22 Foto N. Afrati , Dimitris Fotakis , Jeffrey D. Ullman

On the Computational Complexity of MapReduce

In this paper we study MapReduce computations from a complexity-theoretic perspective. First, we formulate a uniform version of the MRC model of Karloff et al. (2010). We then show that the class of regular languages, and moreover all of…

Computational Complexity · Computer Science 2015-10-07 Benjamin Fish , Jeremy Kun , Ádám Dániel Lelkes , Lev Reyzin , György Turán

Semi-MapReduce Meets Congested Clique

Graph problems are troublesome when it comes to MapReduce. Typically, to be able to design algorithms that make use of the advantages of MapReduce, assumptions beyond what the model imposes, such as the density of the input graph, are…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-15 Soheil Behnezhad , Mahsa Derakhshan , MohammadTaghi Hajiaghayi

Round Compression for Parallel Graph Algorithms in Strongly Sublinear Space

The Massive Parallel Computation (MPC) model is a theoretical framework for popular parallel and distributed platforms such as MapReduce, Hadoop, or Spark. We consider the task of computing a large matching or small vertex cover in this…

Data Structures and Algorithms · Computer Science 2018-07-24 Krzysztof Onak

Some Pairs Problems

A common form of MapReduce application involves discovering relationships between certain pairs of inputs. Similarity joins serve as a good example of this type of problem, which we call a "some-pairs" problem. In the framework of Afrati et…

Databases · Computer Science 2016-02-04 Jeffrey D. Ullman , Jonathan Ullman

Upper and Lower Bounds on the Cost of a Map-Reduce Computation

In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-06-21 Foto N. Afrati , Anish Das Sarma , Semih Salihoglu , Jeffrey D. Ullman

Connecting MapReduce Computations to Realistic Machine Models

We explain how the popular, highly abstract MapReduce model of parallel computation (MRC) can be rooted in reality by explaining how it can be simulated on realistic distributed-memory parallel machine models like BSP. We first refine the…

Data Structures and Algorithms · Computer Science 2020-02-19 Peter Sanders

MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

Given a dataset of points in a metric space and an integer $k$, a diversity maximization problem requires determining a subset of $k$ points maximizing some diversity objective measure, e.g., the minimum or the average distance between two…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-24 Matteo Ceccarello , Andrea Pietracaprina , Geppino Pucci , Eli Upfal

Computing optimal k-regret minimizing sets with top-k depth contours

Regret minimizing sets are a very recent approach to representing a dataset D with a small subset S of representative tuples. The set S is chosen such that executing any top-1 query on S rather than D is minimally perceptible to any user.…

Databases · Computer Science 2012-07-27 Sean Chester , Alex Thomo , S. Venkatesh , Sue Whitesides

Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry

In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the…

Data Structures and Algorithms · Computer Science 2015-03-14 Michael T. Goodrich

Approximate Clustering via Metric Partitioning

In this paper we consider two metric covering/clustering problems - \textit{Minimum Cost Covering Problem} (MCC) and $k$-clustering. In the MCC problem, we are given two point sets $X$ (clients) and $Y$ (servers), and a metric on $X \cup…

Computational Geometry · Computer Science 2016-10-05 Sayan Bandyapadhyay , Kasturi Varadarajan