Related papers: Distributed Log Analysis on the Cloud Using MapRed…
MapReduce has been widely applied in various fields of data and compute intensive applications and also it is important programming model for cloud computing. Hadoop is an open-source implementation of MapReduce which operates on terabytes…
Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the…
MapReduce is a commonly used framework for executing data-intensive jobs on distributed server clusters. We introduce a variant implementation of MapReduce, namely "Coded MapReduce", to substantially reduce the inter-server communication…
MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…
Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation,…
MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed…
The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…
In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…
In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…
This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source…
Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The…
Recently, graph mining approaches have become very popular, especially in domains such as bioinformatics, chemoinformatics and social networks. In this scope, one of the most challenging tasks is frequent subgraph discovery. This task has…
Networking data analytics is increasingly used for enhanced network visibility and controllability. We draw the similarities between the Software Defined Networking (SDN) architecture and the MapReduce programming model. Inspired by the…
An existing approach for dealing with massive data sets is to stream over the input in few passes and perform computations with sublinear resources. This method does not work for truly massive data where even making a single pass over the…
Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…
Large-scale distributed computing systems often contain thousands of distributed nodes (machines). Monitoring the conditions of these nodes is important for system management purposes, which, however, can be extremely resource demanding as…
Due to the pervasive diffusion of personal mobile and IoT devices, many ``smart environments'' (e.g., smart cities and smart factories) will be, among others, generators of huge amounts of data. Currently, this is typically achieved through…
Applications requiring real-time processing of large volumes of data have been the main driver for rethinking the traditional cloud, giving rise to novel cloud models. Distributed cloud (DC) is a model that allows users to dynamically…
Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The…
Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…