Related papers: Coded Distributed Computing over Packet Erasure Ch…
Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…
Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…
Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…
The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The…
In distributed computing systems, it is well recognized that worker nodes that are slow (called stragglers) tend to dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to mitigate the effect of…
We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed…
Distributed computing has become a common approach for large-scale computation of tasks due to benefits such as high reliability, scalability, computation speed, and costeffectiveness. However, distributed computing faces critical issues…
Distributed computing platforms typically assume the availability of reliable and dedicated connections among the processors. This work considers an alternative scenario, relevant for wireless data centers and federated learning, in which…
Distributed matrix computations over large clusters can suffer from the problem of slow or failed worker nodes (called stragglers) which can dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to…
Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a…
In large scale distributed linear transform problems, coded computation plays an important role to effectively deal with "stragglers" (distributed computations that may get delayed due to few slow or faulty processors). We propose a coded…
The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…
Distributed computation is a framework used to break down a complex computational task into smaller tasks and distributing them among computational nodes. Erasure correction codes have recently been introduced and have become a popular…
We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed…
In cloud computing systems slow processing nodes, often referred to as "stragglers", can significantly extend the computation time. Recent results have shown that error correction coding can be used to reduce the effect of stragglers. In…
Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…
Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler…
Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in…
Distributed matrix computations -- matrix-matrix or matrix-vector multiplications -- are well-recognized to suffer from the problem of stragglers (slow or failed worker nodes). Much of prior work in this area is (i) either sub-optimal in…
We consider the problem of computing the convolution of two long vectors using parallel processing units in the presence of "stragglers". Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation…