Related papers: Optimal Load Allocation for Coded Distributed Comp…

Heterogeneous Coded Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…

Information Theory · Computer Science 2019-05-21 Yuxuan Sun , Junlin Zhao , Sheng Zhou , Deniz Gündüz

Coded Computation across Shared Heterogeneous Workers with Communication Delay

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-20 Yuxuan Sun , Fan Zhang , Junlin Zhao , Sheng Zhou , Zhisheng Niu , Deniz Gündüz

Stream Distributed Coded Computing

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…

Information Theory · Computer Science 2021-03-03 Alejandro Cohen , Guillaume Thiran , Homa Esfahanizadeh , Muriel Médard

Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-29 Homa Esfahanizadeh , Alejandro Cohen , Muriel Medard

On Heterogeneous Coded Distributed Computing

We consider the recently proposed Coded Distributed Computing (CDC) framework that leverages carefully designed redundant computations to enable coding opportunities that substantially reduce the communication load of distributed computing.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-04 Mehrdad Kiamari , Chenwei Wang , A. Salman Avestimehr

Heterogeneity-aware Gradient Coding for Straggler Tolerance

Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Haozhao Wang , Song Guo , Bin Tang , Ruixuan Li , Chengjie Li

Coded Computation over Heterogeneous Clusters

In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-21 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar

A Note on "Optimal Static Load Balancing in Distributed Computer Systems"

The problem of minimizing mean response time of generic jobs submitted to a heterogenous distributed computer systems is considered in this paper. A static load balancing strategy, in which decision of redistribution of loads does not…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-11-09 S. A. Mondal

Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…

Information Theory · Computer Science 2022-12-19 Luis Maßny , Christoph Hofmeister , Maximilian Egger , Rawad Bitar , Antonia Wachter-Zeh

How to Optimally Allocate Resources for Coded Distributed Computing?

Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…

Information Theory · Computer Science 2017-02-24 Qian Yu , Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Latency Analysis of Coded Computation Schemes over Wireless Networks

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…

Information Theory · Computer Science 2017-07-04 Amirhossein Reisizadeh , Ramtin Pedarsani

Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange

Owing to data-intensive large-scale applications, distributed computation systems have gained significant recent interest, due to their ability of running such tasks over a large number of commodity nodes in a time efficient manner. One of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-23 Mohamed A. Attia , Ravi Tandon

Hierarchical Coding for Distributed Computing

Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Hyegyeong Park , Kangwook Lee , Jy-yong Sohn , Changho Suh , Jaekyun Moon

Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional…

Networking and Internet Architecture · Computer Science 2024-06-18 Weiheng Tang , Jingyi Li , Lin Chen , Xu Chen

A Unified Coding Framework for Distributed Computing with Straggling Servers

We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme…

Information Theory · Computer Science 2016-10-26 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Load Balancing for Skewed Streams on Heterogeneous Cluster

Streaming applications frequently encounter skewed workloads and execute on heterogeneous clusters. Optimal resource utilization in such adverse conditions becomes a challenge, as it requires inferring the resource capacities and input…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-03 Muhammad Anis Uddin Nasir , Hiroshi Horii , Marco Serafini , Nicolas Kourtellis , Rudy Raymond , Sarunas Girdzijauskas , Takayuki Osogami

Efficient Replication for Straggler Mitigation in Distributed Computing

Master-worker distributed computing systems use task replication in order to mitigate the effect of slow workers, known as stragglers. Tasks are grouped into batches and assigned to one or more workers for execution. We first consider the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-29 Amir Behrouzi-Far , Emina Soljanin

Coded Matrix Multiplication on a Group-Based Model

Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the…

Information Theory · Computer Science 2019-01-17 Muah Kim , Jy-yong Sohn , Jaekyun Moon

Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers

In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data…

Systems and Control · Electrical Eng. & Systems 2025-10-28 Heekang Song , Wan Choi