Related papers: LLMapReduce: Multi-Level Map-Reduce for High Perfo…

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

BSP vs MapReduce

The MapReduce framework has been generating a lot of interest in a wide range of areas. It has been widely adopted in industry and has been used to solve a number of non-trivial problems in academia. Putting MapReduce on strong theoretical…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-06-19 Matthew Felice Pace

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

A Review of CUDA, MapReduce, and Pthreads Parallel Computing Models

The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-17 Kato Mivule , Benjamin Harvey , Crystal Cobb , Hoda El Sayed

MapReduce Scheduler: A 360-degree view

Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-11 Rajdeep Das , Rohit Pratap Singh , Ripon Patgiri

Parallel Sorted Neighborhood Blocking with MapReduce

Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-10-18 Lars Kolb , Andreas Thor , Erhard Rahm

The Efficiency of MapReduce in Parallel External Memory

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-12-19 Gero Greiner , Riko Jacob

Towards co-designed optimizations in parallel frameworks: A MapReduce case study

The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-01 Colin Barrett , Christos Kotselidis , Mikel Luján

Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry

In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the…

Data Structures and Algorithms · Computer Science 2015-03-14 Michael T. Goodrich

Sorting, Searching, and Simulation in the MapReduce Framework

In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-01-11 Michael T. Goodrich , Nodari Sitchinava , Qin Zhang

Security and Privacy Aspects in MapReduce on Clouds: A Survey

MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed…

Databases · Computer Science 2016-05-04 Philip Derbeko , Shlomi Dolev , Ehud Gudes , Shantanu Sharma

Coded MapReduce

MapReduce is a commonly used framework for executing data-intensive jobs on distributed server clusters. We introduce a variant implementation of MapReduce, namely "Coded MapReduce", to substantially reduce the inter-server communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-08 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

An Alternative C++ based HPC system for Hadoop MapReduce

MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-29 Vignesh S. , Muthumanikandan V. , Siddarth S. , Sainath G

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

Machine Learning · Statistics 2015-05-18 Eric P. Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei , Seunghak Lee , Xun Zheng , Pengtao Xie , Abhimanu Kumar , Yaoliang Yu

Complexity Measures for Map-Reduce, and Comparison to Parallel Computing

The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-29 Ashish Goel , Kamesh Munagala

Parallel Processing of Large Graphs

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-04 Tomasz Kajdanowicz , Przemyslaw Kazienko , Wojciech Indyk

GraphLab: A New Framework For Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2014-08-12 Yucheng Low , Joseph E. Gonzalez , Aapo Kyrola , Danny Bickson , Carlos E. Guestrin , Joseph Hellerstein

GraphLab: A New Framework for Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2010-06-28 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin , Joseph M. Hellerstein

Map-Reduce for Multiprocessing Large Data and Multi-threading for Data Scraping

This document is the final project report for our advanced operating system class. During this project, we mainly focused on applying multiprocessing and multi-threading technology to our whole project and utilized the map-reduce algorithm…

Numerical Analysis · Mathematics 2023-12-27 Zefeng Qiu , Prashanth Umapathy , Qingquan Zhang , Guanqun Song , Ting Zhu