Related papers: Coded matrix computation with gradient coding
In a large-scale and distributed matrix multiplication problem $C=A^{\intercal}B$, where $C\in\mathbb{R}^{r\times t}$, the coded computation plays an important role to effectively deal with "stragglers" (distributed computations that may…
Supporting multiple partial computations efficiently at each of the workers is a keystone in distributed coded computing in order to speed up computations and to fully exploit the resources of heterogeneous workers in terms of…
Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behaviour and ignore the computations carried out by…
We consider a generalization of the gradient coding framework where a dataset is divided across $n$ workers and each worker transmits to a master node one or more linear combinations of the gradients over its assigned data subsets. Unlike…
We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a…
Distributed computing systems are well-known to suffer from the problem of slow or failed nodes; these are referred to as stragglers. Straggler mitigation (for distributed matrix computations) has recently been investigated from the…
We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed…
We study the problem of computing matrix chain multiplications in a distributed computing cluster. In such systems, performance is often limited by the straggler problem, where the slowest worker dominates the overall computation latency.…
In distributed matrix multiplication, a common scenario is to assign each worker a fraction of the multiplication task, by partitioning the input matrices into smaller submatrices. In particular, by dividing two input matrices into…
This paper addresses the gradient coding and coded matrix multiplication problems in distributed optimization and coded computing. We present a numerically stable binary coding method which overcomes the drawbacks of the \textit{Fractional…
Matrix computations are a fundamental building-block of edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributing matrix computations…
The cost of data input can dominate the run-time of quantum algorithms. Here, we consider data input of arithmetically structured matrices via block encoding circuits, the input model for the quantum singular value transform and related…
Distributed matrix computations over large clusters can suffer from the problem of slow or failed worker nodes (called stragglers) which can dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to…
Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of…
Gradient descent (GD) methods are commonly employed in machine learning problems to optimize the parameters of the model in an iterative fashion. For problems with massive datasets, computations are distributed to many parallel computing…
Multi-party computation (MPC) is promising for designing privacy-preserving machine learning algorithms at edge networks. An emerging approach is coded-MPC (CMPC), which advocates the use of coded computation to improve the performance of…
Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the…
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix…
Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum…
Tensor operations, such as matrix multiplication, are central to large-scale machine learning applications. For user-driven tasks these operations can be carried out on a distributed computing platform with a master server at the user side…