Related papers: Optimizing MapReduce for Highly Distributed Enviro…

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce

Nowadays many companies have available large amounts of raw, unstructured data. Among Big Data enabling technologies, a central place is held by the MapReduce framework and, in particular, by its open source implementation, Apache Hadoop.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-18 Eugenio Gianniti , Danilo Ardagna , Michele Ciavotta , Mauro Passacantando

Hadoop Performance Models

Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-06-07 Herodotos Herodotou

An Alternative C++ based HPC system for Hadoop MapReduce

MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-29 Vignesh S. , Muthumanikandan V. , Siddarth S. , Sainath G

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many…

Databases · Computer Science 2017-07-07 Shlomi Dolev , Patricia Florissi , Ehud Gudes , Shantanu Sharma , Ido Singer

Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

Designing fast and scalable algorithm for mining frequent itemsets is always being a most eminent and promising problem of data mining. Apriori is one of the most broadly used and popular algorithm of frequent itemset mining. Designing…

Databases · Computer Science 2017-01-24 Sudhakar Singh , Rakhi Garg , P. K. Mishra

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-10 Muralikrishnan Ramane , Sharmila Krishnamoorthy , Sasikala Gowtham

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source…

Networking and Internet Architecture · Computer Science 2019-10-03 Sanaa Hamid Mohamed , Taisir E. H. El-Gorashi , Jaafar M. H. Elmirghani

MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds

MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloud-based data processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-10 B. Thirumala Rao , L. S. S. Reddy

Testing MapReduce-Based Systems

MapReduce (MR) is the most popular solution to build applications for large-scale data processing. These applications are often deployed on large clusters of commodity machines, where failures happen constantly due to bugs, hardware…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-11 João Eugenio Marynowski , Michel Albonico , Eduardo Cunha de Almeida , Gerson Sunyé

Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments

Cloud Computing is emerging as a new computational paradigm shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity hardware clusters such as Clouds. In all Hadoop implementations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-04 B. Thirumala Rao , L. S. S. Reddy

Coded MapReduce

MapReduce is a commonly used framework for executing data-intensive jobs on distributed server clusters. We introduce a variant implementation of MapReduce, namely "Coded MapReduce", to substantially reduce the inter-server communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-08 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Review of Apriori Based Algorithms on MapReduce Framework

The Apriori algorithm that mines frequent itemsets is one of the most popular and widely used data mining algorithms. Now days many algorithms have been proposed on parallel and distributed platforms to enhance the performance of Apriori…

Databases · Computer Science 2017-02-22 Sudhakar Singh , Rakhi Garg , P. K. Mishra

Running genetic algorithms on Hadoop for solving high dimensional optimization problems

Hadoop is a popular MapReduce framework for developing parallel applications in distributed environments. Several advantages of MapReduce such as programming ease and ability to use commodity hardware make the applicability of soft…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-13 Güngör Yildirim , İbrahim R Hallac , Galip Aydin , Yetkin Tatar

How to Optimally Allocate Resources for Coded Distributed Computing?

Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…

Information Theory · Computer Science 2017-02-24 Qian Yu , Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

High Performance Risk Aggregation: Addressing the Data Processing Challenge the Hadoop MapReduce Way

Monte Carlo simulations employed for the analysis of portfolios of catastrophic risk process large volumes of data. Often times these simulations are not performed in real-time scenarios as they are slow and consume large data. Such…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-25 Zhimin Yao , Blesson Varghese , Andrew Rau-Chaplin

Using MapReduce for Large-scale Medical Image Analysis

The growth of the amount of medical image data produced on a daily basis in modern hospitals forces the adaptation of traditional medical image analysis and indexing approaches towards scalable solutions. The number of images and their…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-26 Dimitrios Markonis , Roger Schaer , Ivan Eggel , Henning Müller , Adrien Depeursinge