Related papers: Connecting MapReduce Computations to Realistic Mac…
Graph problems are troublesome when it comes to MapReduce. Typically, to be able to design algorithms that make use of the advantages of MapReduce, assumptions beyond what the model imposes, such as the density of the input graph, are…
In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the…
In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation…
The Massive Parallel Computation (MPC) model is a theoretical framework for popular parallel and distributed platforms such as MapReduce, Hadoop, or Spark. We consider the task of computing a large matching or small vertex cover in this…
This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by…
Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only…
The MapReduce framework has been generating a lot of interest in a wide range of areas. It has been widely adopted in industry and has been used to solve a number of non-trivial problems in academia. Putting MapReduce on strong theoretical…
Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel…
The MapReduce framework has firmly established itself as one of the most widely used parallel computing platforms for processing big data on tera- and peta-byte scale. Approaching it from a theoretical standpoint has proved to be…
We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC)…
Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…
More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of…
In this paper we study MapReduce computations from a complexity-theoretic perspective. First, we formulate a uniform version of the MRC model of Karloff et al. (2010). We then show that the class of regular languages, and moreover all of…
The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…
In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…
The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce…
MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…
In this paper, we revisit the communication vs. distributed computing trade-off, studied within the framework of MapReduce in [1]. An implicit assumption in the aforementioned work is that each server performs all possible computations on…
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a…
For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to…