Related papers: Distributed Log Analysis on the Cloud Using MapRed…

Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment

MapReduce has been widely applied in various fields of data and compute intensive applications and also it is important programming model for cloud computing. Hadoop is an open-source implementation of MapReduce which operates on terabytes…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-01 Sayalee Narkhede , Trupti Baraskar , Debajyoti Mukhopadhyay

Design Architecture-Based on Web Server and Application Cluster in Cloud Environment

Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-03-24 Gita Shah , Annappa , K. C. Shet

Coded MapReduce

MapReduce is a commonly used framework for executing data-intensive jobs on distributed server clusters. We introduce a variant implementation of MapReduce, namely "Coded MapReduce", to substantially reduce the inter-server communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-08 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Optimizing MapReduce for Highly Distributed Environments

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

LogDB: Multivariate Log-based Failure Diagnosis for Distributed Databases (Extended from MultiLog)

Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation,…

Databases · Computer Science 2025-05-06 Lingzhe Zhang , Tong Jia , Mengxi Jia , Ying Li

Security and Privacy Aspects in MapReduce on Clouds: A Survey

MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed…

Databases · Computer Science 2016-05-04 Philip Derbeko , Shlomi Dolev , Ehud Gudes , Shantanu Sharma

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source…

Networking and Internet Architecture · Computer Science 2019-10-03 Sanaa Hamid Mohamed , Taisir E. H. El-Gorashi , Jaafar M. H. Elmirghani

Leveraging Coding Techniques for Speeding up Distributed Computing

Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The…

Information Theory · Computer Science 2018-02-12 Konstantinos Konstantinidis , Aditya Ramamoorthy

A large-scale and fault-tolerant approach of subgraph mining using density-based partitioning

Recently, graph mining approaches have become very popular, especially in domains such as bioinformatics, chemoinformatics and social networks. In this scope, one of the most challenging tasks is frequent subgraph discovery. This task has…

Databases · Computer Science 2016-08-24 Sabeur Aridhi , Laurent d'Orazio , Mondher Maddouri , Engelbert Mephu Nguifo

Network Map Reduce

Networking data analytics is increasingly used for enhanced network visibility and controllability. We draw the similarities between the Software Defined Networking (SDN) architecture and the MapReduce programming model. Inspired by the…

Networking and Internet Architecture · Computer Science 2016-09-13 Haoyu Song , Jun Gong , Hongfei Chen

On the Complexity of Processing Massive, Unordered, Distributed Data

An existing approach for dealing with massive data sets is to stream over the input in few passes and perform computations with sublinear resources. This method does not work for truly massive data where even making a single pass over the…

Computational Complexity · Computer Science 2007-05-23 Jon Feldman , S. Muthukrishnan , Anastasios Sidiropoulos , Cliff Stein , Zoya Svitkina

How to Optimally Allocate Resources for Coded Distributed Computing?

Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…

Information Theory · Computer Science 2017-02-24 Qian Yu , Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Online Collection and Forecasting of Resource Utilization in Large-Scale Distributed Systems

Large-scale distributed computing systems often contain thousands of distributed nodes (machines). Monitoring the conditions of these nodes is important for system management purposes, which, however, can be extremely resource demanding as…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-23 Tiffany Tuor , Shiqiang Wang , Kin K. Leung , Bong Jun Ko

A communication efficient distributed learning framework for smart environments

Due to the pervasive diffusion of personal mobile and IoT devices, many ``smart environments'' (e.g., smart cities and smart factories) will be, among others, generators of huge amounts of data. Currently, this is typically achieved through…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-28 Lorenzo Valerio , Andrea Passarella , Marco Conti

A monitoring system for collecting and aggregating metrics from distributed clouds

Applications requiring real-time processing of large volumes of data have been the main driver for rethinking the traditional cloud, giving rise to novel cloud models. Distributed cloud (DC) is a model that allows users to dynamically…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-06 Tamara Ranković , Mateja Rilak , Janko Rakonjac , Miloš Simić

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-26 Sheriffo Ceesay , Adam Barker , Yuhui Lin

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-10 Muralikrishnan Ramane , Sharmila Krishnamoorthy , Sasikala Gowtham