Related papers: Hierarchical Coding for Distributed Computing
Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the…
To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…
Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…
In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…
The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…
Slow working nodes, known as stragglers, can greatly reduce the speed of distributed computation. Coded matrix multiplication is a recently introduced technique that enables straggler-resistant distributed multiplication of large matrices.…
Elasticity is offered by cloud service providers to exploit under-utilized computing resources. The low-cost elastic nodes can leave and join any time during the computation cycle. The possibility of elastic events occurring together with…
Recently, coding has been a useful technique to mitigate the effect of stragglers in distributed computing. However, coding in this context has been mainly explored under the assumption of homogeneous workers, although the real-world…
In this paper, we consider a hierarchical distributed multi-task learning (MTL) system where distributed users wish to jointly learn different models orchestrated by a central server with the help of a layer of multiple relays. Since the…
We consider the distributed computing problem of multiplying a set of vectors with a matrix. For this scenario, Li et al. recently presented a unified coding framework and showed a fundamental tradeoff between computational delay and…
In this work, we consider the problem of distributed computing of functions of structured sources, focusing on the classical setting of two correlated sources and one user that seeks the outcome of the function while benefiting from…
Coded matrix multiplication is a technique to enable straggler-resistant multiplication of large matrices in distributed computing systems. In this paper, we first present a conceptual framework to represent the division of work amongst…
Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…
Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…
We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme…
We consider the setting of a master server who possesses confidential data (genomic, medical data, etc.) and wants to run intensive computations on it, as part of a machine learning algorithm for example. The master wants to distribute…
Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…
Modern computationally-heavy applications are often time-sensitive, demanding distributed strategies to accelerate them. On the other hand, distributed computing suffers from the bottleneck of slow workers in practice. Distributed coded…
Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…
We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed…