Related papers: NeRCC: Nested-Regression Coded Computing for Resil…
We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed…
One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the…
Coded computing has demonstrated promising results in addressing straggler resiliency in distributed computing systems. However, most coded computing schemes are designed for exact computation, requiring the number of responding servers to…
We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…
Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper…
Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…
Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a…
Deep regression models typically learn in an end-to-end fashion without explicitly emphasizing a regression-aware representation. Consequently, the learned representations exhibit fragmentation and fail to capture the continuous nature of…
This paper introduces REDC, a comprehensive strategy for offloading computational tasks within mobile Edge Networks (EN) to Distributed Computing (DC) after Rateless Encoding (RE). Despite the efficiency, reliability, and scalability…
This document is an evaluation of the original "Rank-N-Contrast" (arXiv:2210.01189v2) paper published in 2023. This evaluation is done for academic purposes. Deep regression models often fail to capture the continuous nature of sample…
In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…
Datacenters execute large computational jobs, which are composed of smaller tasks. A job completes when all its tasks finish, so stragglers -- rare, yet extremely slow tasks -- are a major impediment to datacenter performance. Accurately…
We consider distributed gradient descent in the presence of stragglers. Recent work on \em gradient coding \em and \em approximate gradient coding \em have shown how to add redundancy in distributed gradient descent to guarantee convergence…
In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation…
Distributed matrix computations -- matrix-matrix or matrix-vector multiplications -- are well-recognized to suffer from the problem of stragglers (slow or failed worker nodes). Much of prior work in this area is (i) either sub-optimal in…
Collaborative mobile edge computing (MEC) has emerged as a promising paradigm to enable low-capability edge nodes to cooperatively execute computation-intensive tasks. However, straggling edge nodes (stragglers) significantly degrade the…
Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes…
In distributed optimization problems, a technique called gradient coding, which involves replicating data points, has been used to mitigate the effect of straggling machines. Recent work has studied approximate gradient coding, which…
Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is…
The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The…