English
Related papers

Related papers: General Coded Computing in a Probabilistic Straggl…

200 papers

One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the…

Information Theory · Computer Science 2021-11-02 Tayyebeh Jahani-Nezhad , Mohammad Ali Maddah-Ali

Coded computing is a reliable and fault-tolerant mechanism for implementing large computing tasks over a distributed set of worker nodes. While a majority of coded computing frameworks address accurate computation of the target functions,…

Information Theory · Computer Science 2025-07-03 Rimpi Borah , J. Harshan

Conventional coded computing frameworks are predominantly tailored for structured computations, such as matrix multiplication and polynomial evaluation. Such tasks allow the reuse of tools and techniques from algebraic coding theory to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-13 Parsa Moradi , Hanzaleh Akbarinodehi , Mohammad Ali Maddah-Ali

In a large-scale distributed machine learning system, coded computing has attracted wide-spread attention since it can effectively alleviate the impact of stragglers. However, several emerging problems greatly limit the performance of coded…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-10 Houming Qiu , Kun Zhu , Nguyen Cong Luong , Dusit Niyato

Resilience against stragglers is a critical element of prediction serving systems, tasked with executing inferences on input data for a pre-trained machine-learning model. In this paper, we propose NeRCC, as a general straggler-resistant…

Machine Learning · Computer Science 2024-02-12 Parsa Moradi , Mohammad Ali Maddah-Ali

Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Vipul Gupta , Dominic Carrano , Yaoqing Yang , Vaishaal Shankar , Thomas Courtade , Kannan Ramchandran

In distributed optimization problems, a technique called gradient coding, which involves replicating data points, has been used to mitigate the effect of straggling machines. Recent work has studied approximate gradient coding, which…

Machine Learning · Statistics 2021-08-09 Margalit Glasgow , Mary Wootters

Coded computation is a framework which provides redundancy in distributed computing systems to speed up largescale tasks. Although most existing works assume an error-free scenarios in a master-worker setup, the link failures are common in…

Information Theory · Computer Science 2019-01-14 Dong-Jun Han , Jy-yong Sohn , Jaekyun Moon

Coded polynomial aggregation (CPA) in distributed computing systems enables the master to directly recover a weighted aggregation of polynomial computations without individually decoding each term, thereby reducing the number of required…

Information Theory · Computer Science 2026-02-04 Xi Zhong , Jörg Kliewer , Mingyue Ji

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a…

Machine Learning · Computer Science 2026-03-26 Parsa Moradi , Behrooz Tahmasebi , Mohammad Ali Maddah-Ali

Building on the previous work of Lee et al. and Ferdinand et al. on coded computation, we propose a sequential approximation framework for solving optimization problems in a distributed manner. In a distributed computation system, latency…

Information Theory · Computer Science 2017-10-26 Jingge Zhu , Ye Pu , Vipul Gupta , Claire Tomlin , Kannan Ramchandran

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for…

Information Theory · Computer Science 2023-04-26 Qi Wang , Ying Cui , Chenglin Li , Junni Zou , Hongkai Xiong

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…

Information Theory · Computer Science 2017-07-04 Amirhossein Reisizadeh , Ramtin Pedarsani

Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes…

Information Theory · Computer Science 2017-10-30 Songze Li , Seyed Mohammadreza Mousavi Kalan , A. Salman Avestimehr , Mahdi Soltanolkotabi

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, suffer from slow running machines, called 'stragglers'. Gradient coding is a coding-theoretic framework to…

Information Theory · Computer Science 2019-05-01 Swanand Kadhe , O. Ozan Koyluoglu , Kannan Ramchandran

Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…

Information Theory · Computer Science 2018-06-28 Nuwan Ferdinand , Stark Draper

It has been established that when the gradient coding problem is distributed among $n$ servers, the computation load (number of stored data partitions) of each worker is at least $s+1$ in order to resists $s$ stragglers. This scheme incurs…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-25 Sinong Wang , Jiashang Liu , Ness Shroff

Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behaviour and ignore the computations carried out by…

Information Theory · Computer Science 2021-12-07 Emre Ozfatura , Sennur Ulukus , Deniz Gunduz

Tensors are a fundamental operation in distributed computing, \emph{e.g.,} machine learning, that are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall…

Information Theory · Computer Science 2024-10-30 Pedro Soto

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation…

Machine Learning · Computer Science 2023-06-29 M. Nikhil Krishnan , MohammadReza Ebrahimi , Ashish Khisti
‹ Prev 1 2 3 10 Next ›