Related papers: Towards co-designed optimizations in parallel fram…

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

Efficient Computation of the Well-Founded Semantics over Big Data

Data originating from the Web, sensor readings and social media result in increasingly huge datasets. The so called Big Data comes with new scientific and technological challenges while creating new opportunities, hence the increasing…

Artificial Intelligence · Computer Science 2020-02-19 Ilias Tachmazidis , Grigoris Antoniou , Wolfgang Faber

Optimizing MapReduce for Highly Distributed Environments

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

Sorting, Searching, and Simulation in the MapReduce Framework

In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-01-11 Michael T. Goodrich , Nodari Sitchinava , Qin Zhang

LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-13 Chansup Byun , Jeremy Kepner , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Matthew Hubbell , Peter Michaleas , Julie Mullen , Andrew Prout , Antonio Rosa , Charles Yee , Albert Reuther

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become…

Databases · Computer Science 2018-06-20 Maaz Bin Safeer Ahmad , Alvin Cheung

The Efficiency of MapReduce in Parallel External Memory

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-12-19 Gero Greiner , Riko Jacob

Coded MapReduce

MapReduce is a commonly used framework for executing data-intensive jobs on distributed server clusters. We introduce a variant implementation of MapReduce, namely "Coded MapReduce", to substantially reduce the inter-server communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-08 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Relaxing Concurrent Data-structure Semantics for Increasing Performance: A Multi-structure 2D Design Framework

There has been a significant amount of work in the literature proposing semantic relaxation of concurrent data structures for improving scalability and performance. By relaxing the semantics of a data structure, a bigger design space, that…

Data Structures and Algorithms · Computer Science 2025-11-11 Adones Rukundo , Aras Atalar , Philippas Tsigas

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing

We are living in the era of Big Data and witnessing the explosion of data. Given that the limitation of CPU and I/O in a single computer, the mainstream approach to scalability is to distribute computations among a large number of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-27 Bingbing Rao , Liqiang Wang

Automatic Optimization for MapReduce Programs

The MapReduce distributed programming framework has become popular, despite evidence that current implementations are inefficient, requiring far more hardware than a traditional relational databases to complete similar tasks. MapReduce jobs…

Databases · Computer Science 2011-04-19 Eaman Jahani , Michael J. Cafarella , Christopher Ré

An Alternative C++ based HPC system for Hadoop MapReduce

MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-29 Vignesh S. , Muthumanikandan V. , Siddarth S. , Sainath G

Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry

In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the…

Data Structures and Algorithms · Computer Science 2015-03-14 Michael T. Goodrich

MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

Locality-Aware Hybrid Coded MapReduce for Server-Rack Architecture

MapReduce is a widely used framework for distributed computing. Data shuffling between the Map phase and Reduce phase of a job involves a large amount of data transfer across servers, which in turn accounts for increase in job completion…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-06 Sneh Gupta , V. Lalitha

Leveraging Parallel Data Processing Frameworks with Verified Lifting

Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific…

Programming Languages · Computer Science 2016-11-24 Maaz Bin Safeer Ahmad , Alvin Cheung

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Parallel Processing of Large Graphs

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-04 Tomasz Kajdanowicz , Przemyslaw Kazienko , Wojciech Indyk

BSP vs MapReduce

The MapReduce framework has been generating a lot of interest in a wide range of areas. It has been widely adopted in industry and has been used to solve a number of non-trivial problems in academia. Putting MapReduce on strong theoretical…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-06-19 Matthew Felice Pace