Related papers: On Gradient Coding with Partial Recovery

Fundamental Limits of Approximate Gradient Coding

It has been established that when the gradient coding problem is distributed among $n$ servers, the computation load (number of stored data partitions) of each worker is at least $s+1$ in order to resists $s$ stragglers. This scheme incurs…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-25 Sinong Wang , Jiashang Liu , Ness Shroff

Gradient Coding with Clustering and Multi-message Communication

Gradient descent (GD) methods are commonly employed in machine learning problems to optimize the parameters of the model in an iterative fashion. For problems with massive datasets, computations are distributed to many parallel computing…

Information Theory · Computer Science 2019-03-06 Emre Ozfatura , Deniz Gunduz , Sennur Ulukus

Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for…

Information Theory · Computer Science 2023-04-26 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

Hierarchical Gradient Coding: From Optimal Design to Privacy at Intermediate Nodes

Gradient coding is a distributed computing technique for computing gradient vectors over large datasets by outsourcing partial computations to multiple workers, typically connected directly to the server. In this work, we investigate…

Information Theory · Computer Science 2025-11-24 Ali Gholami , Tayyebeh Jahani-Nezhad , Kai Wan , Giuseppe Caire

Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…

Information Theory · Computer Science 2022-12-19 Luis Maßny , Christoph Hofmeister , Maximilian Egger , Rawad Bitar , Antonia Wachter-Zeh

Heterogeneity-aware Gradient Coding for Straggler Tolerance

Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Haozhao Wang , Song Guo , Bin Tang , Ruixuan Li , Chengjie Li

Numerically Stable Binary Gradient Coding

A major hurdle in machine learning is scalability to massive datasets. One approach to overcoming this is to distribute the computational tasks among several workers. \textit{Gradient coding} has been recently proposed in distributed…

Information Theory · Computer Science 2020-09-16 Neophytos Charalambides , Hessam Mahdavifar , Alfred O. Hero

Leveraging partial stragglers within gradient coding

Within distributed learning, workers typically compute gradients on their assigned dataset chunks and send them to the parameter server (PS), which aggregates them to compute either an exact or approximate version of $\nabla L$ (gradient of…

Information Theory · Computer Science 2024-11-19 Aditya Ramamoorthy , Ruoyu Meng , Vrinda S. Girimaji

Sequential Gradient Coding For Straggler Mitigation

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation…

Machine Learning · Computer Science 2023-06-29 M. Nikhil Krishnan , MohammadReza Ebrahimi , Ashish Khisti

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient…

Information Theory · Computer Science 2021-03-03 Tayyebeh Jahani-Nezhad , Mohammad Ali Maddah-Ali

Communication-Computation Efficient Gradient Coding

This paper develops coding techniques to reduce the running time of distributed learning tasks. It characterizes the fundamental tradeoff to compute gradients (and more generally vector summations) in terms of three parameters: computation…

Machine Learning · Statistics 2018-02-13 Min Ye , Emmanuel Abbe

Optimization-based Block Coordinate Gradient Coding

Existing gradient coding schemes introduce identical redundancy across the coordinates of gradients and hence cannot fully utilize the computation results from partial stragglers. This motivates the introduction of diverse redundancies…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-21 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding

We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed…

Information Theory · Computer Science 2018-05-28 Songze Li , Seyed Mohammadreza Mousavi Kalan , Qian Yu , Mahdi Soltanolkotabi , A. Salman Avestimehr

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines called 'stragglers', and…

Information Theory · Computer Science 2020-05-15 Swanand Kadhe , O. Ozan Koyluoglu , Kannan Ramchandran

Coded matrix computation with gradient coding

Polynomial based approaches, such as the Mat-Dot and entangled polynomial codes (EPC) have been used extensively within coded matrix computations to obtain schemes with good recovery thresholds. However, these schemes are well-recognized to…

Information Theory · Computer Science 2023-05-11 Kyungrak Son , Aditya Ramamoorthy

Communication-Efficient Approximate Gradient Coding

Large-scale distributed learning aims at minimizing a loss function $L$ that depends on a training dataset with respect to a $d$-length parameter vector. The distributed cluster typically consists of a parameter server (PS) and multiple…

Information Theory · Computer Science 2026-03-25 Sifat Munim , Aditya Ramamoorthy

Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional…

Networking and Internet Architecture · Computer Science 2024-06-18 Weiheng Tang , Jingyi Li , Lin Chen , Xu Chen

Gradient Coding from Cyclic MDS Codes and Expander Graphs

Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favorably with existing…

Information Theory · Computer Science 2019-07-09 Netanel Raviv , Itzhak Tamo , Rashish Tandon , Alexandros G. Dimakis

Computation Scheduling for Distributed Machine Learning with Straggling Workers

We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-08 Mohammad Mohammadi Amiri , Deniz Gunduz

Improving Distributed Gradient Descent Using Reed-Solomon Codes

Today's massively-sized datasets have made it necessary to often perform computations on them in a distributed manner. In principle, a computational task is divided into subtasks which are distributed over a cluster operated by a…

Information Theory · Computer Science 2017-06-20 Wael Halbawi , Navid Azizan-Ruhi , Fariborz Salehi , Babak Hassibi