English
Related papers

Related papers: Straggler Mitigation in Distributed Optimization T…

200 papers

Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is…

Machine Learning · Statistics 2018-03-15 Can Karakus , Yifan Sun , Suhas Diggavi , Wotao Yin

Dealing with the shear size and complexity of today's massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing…

Information Theory · Computer Science 2019-04-03 A. Salman Avestimehr , Seyed Mohammadreza Mousavi Kalan , Mahdi Soltanolkotabi

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for…

Information Theory · Computer Science 2023-04-26 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The…

Information Theory · Computer Science 2020-02-11 Aditya Ramamoorthy , Anindya Bijoy Das , Li Tang

In cloud computing systems slow processing nodes, often referred to as "stragglers", can significantly extend the computation time. Recent results have shown that error correction coding can be used to reduce the effect of stragglers. In…

Information Theory · Computer Science 2018-06-28 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data…

Systems and Control · Electrical Eng. & Systems 2025-10-28 Heekang Song , Wan Choi

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…

Information Theory · Computer Science 2022-12-19 Luis Maßny , Christoph Hofmeister , Maximilian Egger , Rawad Bitar , Antonia Wachter-Zeh

Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum…

Information Theory · Computer Science 2023-08-24 Anindya Bijoy Das , Aditya Ramamoorthy , David J. Love , Christopher G. Brinton

Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Vipul Gupta , Dominic Carrano , Yaoqing Yang , Vaishaal Shankar , Thomas Courtade , Kannan Ramchandran

Existing gradient coding schemes introduce identical redundancy across the coordinates of gradients and hence cannot fully utilize the computation results from partial stragglers. This motivates the introduction of diverse redundancies…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-21 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar

In distributed computing systems, it is well recognized that worker nodes that are slow (called stragglers) tend to dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to mitigate the effect of…

Information Theory · Computer Science 2018-09-18 Anindya B. Das , Li Tang , Aditya Ramamoorthy

Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time. Coding-theoretic techniques have been recently proposed to mitigate stragglers via algorithmic…

Machine Learning · Statistics 2017-11-21 Zachary Charles , Dimitris Papailiopoulos , Jordan Ellenberg

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation…

Machine Learning · Computer Science 2023-06-29 M. Nikhil Krishnan , MohammadReza Ebrahimi , Ashish Khisti

Building on the previous work of Lee et al. and Ferdinand et al. on coded computation, we propose a sequential approximation framework for solving optimization problems in a distributed manner. In a distributed computation system, latency…

Information Theory · Computer Science 2017-10-26 Jingge Zhu , Ye Pu , Vipul Gupta , Claire Tomlin , Kannan Ramchandran

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-20 Yuxuan Sun , Fan Zhang , Junlin Zhao , Sheng Zhou , Zhisheng Niu , Deniz Gündüz

Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…

Information Theory · Computer Science 2018-06-28 Nuwan Ferdinand , Stark Draper

We consider the problem of stragglers in distributed computing systems. Stragglers, which are compute nodes that unpredictably slow down, often increase the completion times of tasks. One common approach to mitigating stragglers is work…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-07 Tharindu Adikari , Haider Al-Lawati , Jason Lam , Zhenhua Hu , Stark C. Draper

Optimization in distributed networks plays a central role in almost all distributed machine learning problems. In principle, the use of distributed task allocation has reduced the computational time, allowing better response rates and…

Optimization and Control · Mathematics 2020-07-28 Elie Atallah , Nazanin Rahnavard , Chinwendu Enyioha

Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional…

Networking and Internet Architecture · Computer Science 2024-06-18 Weiheng Tang , Jingyi Li , Lin Chen , Xu Chen
‹ Prev 1 2 3 10 Next ›