Related papers: Approximate Gradient Coding with Optimal Decoding

Gradient Coding from Cyclic MDS Codes and Expander Graphs

Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favorably with existing…

Information Theory · Computer Science 2019-07-09 Netanel Raviv , Itzhak Tamo , Rashish Tandon , Alexandros G. Dimakis

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, suffer from slow running machines, called 'stragglers'. Gradient coding is a coding-theoretic framework to…

Information Theory · Computer Science 2019-05-01 Swanand Kadhe , O. Ozan Koyluoglu , Kannan Ramchandran

Approximate Gradient Coding via Sparse Random Graphs

Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time. Coding-theoretic techniques have been recently proposed to mitigate stragglers via algorithmic…

Machine Learning · Statistics 2017-11-21 Zachary Charles , Dimitris Papailiopoulos , Jordan Ellenberg

Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers

In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data…

Systems and Control · Electrical Eng. & Systems 2025-10-28 Heekang Song , Wan Choi

Fundamental Limits of Approximate Gradient Coding

It has been established that when the gradient coding problem is distributed among $n$ servers, the computation load (number of stored data partitions) of each worker is at least $s+1$ in order to resists $s$ stragglers. This scheme incurs…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-25 Sinong Wang , Jiashang Liu , Ness Shroff

Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning

We consider distributed gradient descent in the presence of stragglers. Recent work on \em gradient coding \em and \em approximate gradient coding \em have shown how to add redundancy in distributed gradient descent to guarantee convergence…

Information Theory · Computer Science 2019-05-15 Rawad Bitar , Mary Wootters , Salim El Rouayheb

Optimization-based Block Coordinate Gradient Coding

Existing gradient coding schemes introduce identical redundancy across the coordinates of gradients and hence cannot fully utilize the computation results from partial stragglers. This motivates the introduction of diverse redundancies…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-21 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

Communication-Efficient Approximate Gradient Coding

Large-scale distributed learning aims at minimizing a loss function $L$ that depends on a training dataset with respect to a $d$-length parameter vector. The distributed cluster typically consists of a parameter server (PS) and multiple…

Information Theory · Computer Science 2026-03-25 Sifat Munim , Aditya Ramamoorthy

Improving Distributed Gradient Descent Using Reed-Solomon Codes

Today's massively-sized datasets have made it necessary to often perform computations on them in a distributed manner. In principle, a computational task is divided into subtasks which are distributed over a cluster operated by a…

Information Theory · Computer Science 2017-06-20 Wael Halbawi , Navid Azizan-Ruhi , Fariborz Salehi , Babak Hassibi

Heterogeneity-aware Gradient Coding for Straggler Tolerance

Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Haozhao Wang , Song Guo , Bin Tang , Ruixuan Li , Chengjie Li

Communication-Efficient Approximate Gradient Coding for Distributed Learning in Heterogeneous Systems

We propose a communication-efficient optimally structured gradient coding scheme to jointly address straggler resilience and communication efficiency in heterogeneous distributed learning. By establishing a unified framework that…

Systems and Control · Electrical Eng. & Systems 2026-05-18 Heekang Song , Wan Choi

Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for…

Information Theory · Computer Science 2023-04-26 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

Sequential Gradient Coding For Straggler Mitigation

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation…

Machine Learning · Computer Science 2023-06-29 M. Nikhil Krishnan , MohammadReza Ebrahimi , Ashish Khisti

Straggler-Robust Distributed Optimization with the Parameter Server Utilizing Coded Gradient

Optimization in distributed networks plays a central role in almost all distributed machine learning problems. In principle, the use of distributed task allocation has reduced the computational time, allowing better response rates and…

Optimization and Control · Mathematics 2020-07-28 Elie Atallah , Nazanin Rahnavard , Chinwendu Enyioha

Approximate Gradient Coding for Heterogeneous Nodes

In distributed machine learning (DML), the training data is distributed across multiple worker nodes to perform the underlying training in parallel. One major problem affecting the performance of DML algorithms is presence of stragglers.…

Information Theory · Computer Science 2021-05-14 Amogh Johri , Arti Yardi , Tejas Bodas

CoDGraD: A Code-based Distributed Gradient Descent Scheme for Decentralized Convex Optimization

In this paper, we consider a large network containing many regions such that each region is equipped with a worker with some data processing and communication capability. For such a network, some workers may become stragglers due to the…

Systems and Control · Electrical Eng. & Systems 2022-04-14 Elie Atallah , Nazanin Rahnavard , Qiyu Sun

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines called 'stragglers', and…

Information Theory · Computer Science 2020-05-15 Swanand Kadhe , O. Ozan Koyluoglu , Kannan Ramchandran

Robust Gradient Descent via Moment Encoding with LDPC Codes

This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of {\em straggling} processors. To mitigate the effect of the stragglers, it has been previously…

Machine Learning · Statistics 2019-01-04 Raj Kumar Maity , Ankit Singh Rawat , Arya Mazumdar

Gradient Coding in Decentralized Learning for Evading Stragglers

In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with…

Machine Learning · Computer Science 2024-06-17 Chengxi Li , Mikael Skoglund

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes…

Information Theory · Computer Science 2017-10-30 Songze Li , Seyed Mohammadreza Mousavi Kalan , A. Salman Avestimehr , Mahdi Soltanolkotabi