Related papers: Coded Matrix Multiplication on a Group-Based Model
Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of…
Building on the previous work of Lee et al. and Ferdinand et al. on coded computation, we propose a sequential approximation framework for solving optimization problems in a distributed manner. In a distributed computation system, latency…
Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…
Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…
Gradient descent (GD) methods are commonly employed in machine learning problems to optimize the parameters of the model in an iterative fashion. For problems with massive datasets, computations are distributed to many parallel computing…
Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…
The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…
The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The…
Coded matrix multiplication is a technique to enable straggler-resistant multiplication of large matrices in distributed computing systems. In this paper, we first present a conceptual framework to represent the division of work amongst…
In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…
Slow working nodes, known as stragglers, can greatly reduce the speed of distributed computation. Coded matrix multiplication is a recently introduced technique that enables straggler-resistant distributed multiplication of large matrices.…
We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed…
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix…
Coded computing is an effective technique to mitigate "stragglers" in large-scale and distributed matrix multiplication. In particular, univariate polynomial codes have been shown to be effective in straggler mitigation by making the…
In a large-scale and distributed matrix multiplication problem $C=A^{\intercal}B$, where $C\in\mathbb{R}^{r\times t}$, the coded computation plays an important role to effectively deal with "stragglers" (distributed computations that may…
Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a…
Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behaviour and ignore the computations carried out by…
We consider the distributed computing problem of multiplying a set of vectors with a matrix. For this scenario, Li et al. recently presented a unified coding framework and showed a fundamental tradeoff between computational delay and…
We study the problem of computing matrix chain multiplications in a distributed computing cluster. In such systems, performance is often limited by the straggler problem, where the slowest worker dominates the overall computation latency.…
We consider a MapReduce-type task running in a distributed computing model which consists of ${K}$ edge computing nodes distributed across the edge of the network and a Master node that assists the edge nodes to compute output functions.…