English
Related papers

Related papers: Gradient Coding with Dynamic Clustering for Stragg…

200 papers

Distributed implementations are crucial in speeding up large scale machine learning applications. Distributed gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers. A…

Information Theory · Computer Science 2021-03-02 Baturalp Buyukates , Emre Ozfatura , Sennur Ulukus , Deniz Gunduz

Gradient descent (GD) methods are commonly employed in machine learning problems to optimize the parameters of the model in an iterative fashion. For problems with massive datasets, computations are distributed to many parallel computing…

Information Theory · Computer Science 2019-03-06 Emre Ozfatura , Deniz Gunduz , Sennur Ulukus

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation…

Machine Learning · Computer Science 2023-06-29 M. Nikhil Krishnan , MohammadReza Ebrahimi , Ashish Khisti

Distributed gradient descent (DGD) is an efficient way of implementing gradient descent (GD), especially for large data sets, by dividing the computation tasks into smaller subtasks and assigning to different computing servers (CSs) to be…

Information Theory · Computer Science 2018-11-29 Emre Ozfatura , Deniz Gunduz , Sennur Ulukus

We consider distributed gradient descent in the presence of stragglers. Recent work on \em gradient coding \em and \em approximate gradient coding \em have shown how to add redundancy in distributed gradient descent to guarantee convergence…

Information Theory · Computer Science 2019-05-15 Rawad Bitar , Mary Wootters , Salim El Rouayheb

Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes…

Information Theory · Computer Science 2017-10-30 Songze Li , Seyed Mohammadreza Mousavi Kalan , A. Salman Avestimehr , Mahdi Soltanolkotabi

Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Haozhao Wang , Song Guo , Bin Tang , Ruixuan Li , Chengjie Li

Coded computation techniques provide robustness against straggling servers in distributed computing, with the following limitations: First, they increase decoding complexity. Second, they ignore computations carried out by straggling…

Machine Learning · Computer Science 2018-11-29 Emre Ozfatura , Sennur Ulukus , Deniz Gunduz

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…

Information Theory · Computer Science 2022-12-19 Luis Maßny , Christoph Hofmeister , Maximilian Egger , Rawad Bitar , Antonia Wachter-Zeh

Gradient descent and its many variants, including mini-batch stochastic gradient descent, form the algorithmic foundation of modern large-scale machine learning. Due to the size and scale of modern data, gradient computations are often…

Machine Learning · Statistics 2018-05-29 Zachary Charles , Dimitris Papailiopoulos

We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a…

Information Theory · Computer Science 2019-01-16 Shunsuke Horii , Takahiro Yoshida , Manabu Kobayashi , Toshiyasu Matsushima

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for…

Information Theory · Computer Science 2023-04-26 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar

In distributed machine learning (DML), the training data is distributed across multiple worker nodes to perform the underlying training in parallel. One major problem affecting the performance of DML algorithms is presence of stragglers.…

Information Theory · Computer Science 2021-05-14 Amogh Johri , Arti Yardi , Tejas Bodas

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can…

Machine Learning · Statistics 2018-05-11 Sanghamitra Dutta , Gauri Joshi , Soumyadip Ghosh , Parijat Dube , Priya Nagpurkar

We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-08 Mohammad Mohammadi Amiri , Deniz Gunduz

Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favorably with existing…

Information Theory · Computer Science 2019-07-09 Netanel Raviv , Itzhak Tamo , Rashish Tandon , Alexandros G. Dimakis

Within distributed learning, workers typically compute gradients on their assigned dataset chunks and send them to the parameter server (PS), which aggregates them to compute either an exact or approximate version of $\nabla L$ (gradient of…

Information Theory · Computer Science 2024-11-19 Aditya Ramamoorthy , Ruoyu Meng , Vrinda S. Girimaji

In cloud computing systems slow processing nodes, often referred to as "stragglers", can significantly extend the computation time. Recent results have shown that error correction coding can be used to reduce the effect of stragglers. In…

Information Theory · Computer Science 2018-06-28 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Gradient-based distributed learning in Parameter Server (PS) computing architectures is subject to random delays due to straggling worker nodes, as well as to possible communication bottlenecks between PS and workers. Solutions have been…

Information Theory · Computer Science 2020-04-09 Jingjing Zhang , Osvaldo Simeone
‹ Prev 1 2 3 10 Next ›