Related papers: CAMR: Coded Aggregated MapReduce
This paper studies the computation-communication tradeoff in a heterogeneous MapReduce computing system where each distributed node is equipped with different computation capability. We first obtain an achievable communication load for any…
Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The…
MapReduce is a commonly used framework for executing data-intensive jobs on distributed server clusters. We introduce a variant implementation of MapReduce, namely "Coded MapReduce", to substantially reduce the inter-server communication…
Distributed computing frameworks such as MapReduce are often used to process large computational jobs. They operate by partitioning each job into smaller tasks executed on different servers. The servers also need to exchange intermediate…
Coded distributed computing introduced by Li et al. in 2015 is an efficient approach to trade computing power to reduce the communication load in general distributed computing frameworks such as MapReduce. In particular, Li et al. show that…
MapReduce is a widely used framework for distributed computing. Data shuffling between the Map phase and Reduce phase of a job involves a large amount of data transfer across servers, which in turn accounts for increase in job completion…
Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations. Each operation…
We consider a wireless distributed computing system based on the MapReduce framework, which consists of three phases: \textit{Map}, \textit{Shuffle}, and \textit{Reduce}. The system consists of a set of distributed nodes assigned to compute…
Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…
We consider the distributed computing framework of MapReduce, which consists of three phases, the Map phase, the Shuffle phase and the Reduce phase. For this framework, we propose the use of binary matrices (with $0,1$ entries) called…
Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…
Motivated by mobile edge computing and wireless data centers, we study a wireless distributed computing framework where the distributed nodes exchange information over a wireless interference network. Our framework follows the structure of…
We consider a distributed computing framework where the distributed nodes have different communication capabilities, motivated by the heterogeneous networks in data centers and mobile edge computing systems. Following the structure of…
Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…
We consider a MapReduce-type task running in a distributed computing model which consists of ${K}$ edge computing nodes distributed across the edge of the network and a Master node that assists the edge nodes to compute output functions.…
Coded distributed computing (CDC) introduced by Li et. al. is an effective technique to trade computation load for communication load in a MapReduce framework. CDC achieves an optimal trade-off by duplicating map computations at $r$…
In this paper, we present a coded computation (CC) scheme for distributed computation of the inference phase of machine learning (ML) tasks, specifically, the task of image classification. Building upon Agrawal et al.~2022, the proposed…
MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is…
Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only…
Distributed learning platforms for processing large scale data-sets are becoming increasingly prevalent. In typical distributed implementations, a centralized master node breaks the data-set into smaller batches for parallel processing…