Related papers: Biased Compression in Gradient Coding for Distribu…
Distributed learning, particularly Federated Learning (FL), faces a significant bottleneck in the communication cost, particularly the uplink transmission of client-to-server updates, which is often constrained by asymmetric bandwidth…
In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show…
In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with…
This paper considers the problem of distributed learning (DL) in the presence of stragglers. For this problem, DL methods based on gradient coding have been widely investigated, which redundantly distribute the training data to the workers…
We propose a communication-efficient optimally structured gradient coding scheme to jointly address straggler resilience and communication efficiency in heterogeneous distributed learning. By establishing a unified framework that…
Distributed learning algorithms, such as the ones employed in Federated Learning (FL), require communication compression to reduce the cost of client uploads. The compression methods used in practice are often biased, making error feedback…
Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two…
In federated learning (FL) systems, e.g., wireless networks, the communication cost between the clients and the central server can often be a bottleneck. To reduce the communication cost, the paradigm of communication compression has become…
The widespread adoption of distributed learning to train a global model from local data has been hindered by the challenge posed by stragglers. Recent attempts to mitigate this issue through gradient coding have proved difficult due to the…
In this article, we address the problem of federated learning in the presence of stragglers. For this problem, a coded federated learning framework has been proposed, where the central server aggregates gradients received from the…
Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines called 'stragglers', and…
Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…
Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…
Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across…
On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited…
Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent…
Coded computation can be used to speed up distributed learning in the presence of straggling workers. Partial recovery of the gradient vector can further reduce the computation time at each iteration; however, this can result in biased…
Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient…
The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the…
Stragglers' effects are known to degrade FL performance. In this paper, we investigate federated learning (FL) over wireless networks in the presence of communication stragglers, where the power-constrained clients collaboratively train a…