Related papers: Stream Iterative Distributed Coded Computing for L…

Stream Distributed Coded Computing

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…

Information Theory · Computer Science 2021-03-03 Alejandro Cohen , Guillaume Thiran , Homa Esfahanizadeh , Muriel Médard

Coded Computation across Shared Heterogeneous Workers with Communication Delay

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-20 Yuxuan Sun , Fan Zhang , Junlin Zhao , Sheng Zhou , Zhisheng Niu , Deniz Gündüz

Heterogeneous Coded Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…

Information Theory · Computer Science 2019-05-21 Yuxuan Sun , Junlin Zhao , Sheng Zhou , Deniz Gündüz

Optimal Load Allocation for Coded Distributed Computation in Heterogeneous Clusters

Recently, coding has been a useful technique to mitigate the effect of stragglers in distributed computing. However, coding in this context has been mainly explored under the assumption of homogeneous workers, although the real-world…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-18 DaeJin Kim , Hyegyeong Park , Junkyun Choi

Hierarchical Coding for Distributed Computing

Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Hyegyeong Park , Kangwook Lee , Jy-yong Sohn , Changho Suh , Jaekyun Moon

Computation Scheduling for Distributed Machine Learning with Straggling Workers

We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-08 Mohammad Mohammadi Amiri , Deniz Gunduz

Distributed Computations with Layered Resolution

Modern computationally-heavy applications are often time-sensitive, demanding distributed strategies to accelerate them. On the other hand, distributed computing suffers from the bottleneck of slow workers in practice. Distributed coded…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-03 Homa Esfahanizadeh , Alejandro Cohen , Muriel Médard , Shlomo Shamai

Coded Distributed Computing: Performance Limits and Code Designs

We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed…

Information Theory · Computer Science 2019-06-25 Mohammad Vahid Jamali , Mahdi Soleymani , Hessam Mahdavifar

Distributed Optimization using Heterogeneous Compute Systems

Hardware compute power has been growing at an unprecedented rate in recent years. The utilization of such advancements plays a key role in producing better results in less time -- both in academia and industry. However, merging the existing…

Machine Learning · Computer Science 2021-10-19 Vineeth S

Heterogeneous Coded Distributed Computing: Joint Design of File Allocation and Function Assignment

This paper studies the computation-communication tradeoff in a heterogeneous MapReduce computing system where each distributed node is equipped with different computation capability. We first obtain an achievable communication load for any…

Information Theory · Computer Science 2019-08-20 Fan Xu , Meixia Tao

How to Optimally Allocate Resources for Coded Distributed Computing?

Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…

Information Theory · Computer Science 2017-02-24 Qian Yu , Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Coded Distributed Computing for Hierarchical Multi-task Learning

In this paper, we consider a hierarchical distributed multi-task learning (MTL) system where distributed users wish to jointly learn different models orchestrated by a central server with the help of a layer of multiple relays. Since the…

Information Theory · Computer Science 2022-12-19 Haoyang Hu , Songze Li , Minquan Cheng , Youlong Wu

Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange

Owing to data-intensive large-scale applications, distributed computation systems have gained significant recent interest, due to their ability of running such tasks over a large number of commodity nodes in a time efficient manner. One of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-23 Mohamed A. Attia , Ravi Tandon

On Batch-Processing Based Coded Computing for Heterogeneous Distributed Computing Systems

In recent years, coded distributed computing (CDC) has attracted significant attention, because it can efficiently facilitate many delay-sensitive computation tasks against unexpected latencies in distributed computing systems. Despite such…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-12 Baoqian Wang , Junfei Xie , Kejie Lu , Yan Wan , Shengli Fu

Incentive Mechanism Design for Distributed Coded Machine Learning

A distributed machine learning platform needs to recruit many heterogeneous worker nodes to finish computation simultaneously. As a result, the overall performance may be degraded due to straggling workers. By introducing redundancy into…

Computer Science and Game Theory · Computer Science 2020-12-17 Ningning Ding , Zhixuan Fang , Lingjie Duan , Jianwei Huang

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a…

Machine Learning · Computer Science 2026-03-26 Parsa Moradi , Behrooz Tahmasebi , Mohammad Ali Maddah-Ali

Hierarchical Coded Computation

Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…

Information Theory · Computer Science 2018-06-28 Nuwan Ferdinand , Stark Draper

Coded Computation over Heterogeneous Clusters

In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-21 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

A Unified Coding Framework for Distributed Computing with Straggling Servers

We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme…

Information Theory · Computer Science 2016-10-26 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Coded Computing via Binary Linear Codes: Designs and Performance Limits

We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed…

Information Theory · Computer Science 2021-10-06 Mahdi Soleymani , Mohammad Vahid Jamali , Hessam Mahdavifar