English
Related papers

Related papers: Slack Squeeze Coded Computing for Adaptive Straggl…

200 papers

In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…

Information Theory · Computer Science 2021-02-02 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Slow working nodes, known as stragglers, can greatly reduce the speed of distributed computation. Coded matrix multiplication is a recently introduced technique that enables straggler-resistant distributed multiplication of large matrices.…

Information Theory · Computer Science 2019-07-23 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…

Information Theory · Computer Science 2021-03-03 Alejandro Cohen , Guillaume Thiran , Homa Esfahanizadeh , Muriel Médard

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-20 Yuxuan Sun , Fan Zhang , Junlin Zhao , Sheng Zhou , Zhisheng Niu , Deniz Gündüz

Elasticity is offered by cloud service providers to exploit under-utilized computing resources. The low-cost elastic nodes can leave and join any time during the computation cycle. The possibility of elastic events occurring together with…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-24 Shahrzad Kiani , Tharindu Adikari , Stark C. Draper

In cloud computing systems slow processing nodes, often referred to as "stragglers", can significantly extend the computation time. Recent results have shown that error correction coding can be used to reduce the effect of stragglers. In…

Information Theory · Computer Science 2018-06-28 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in…

Machine Learning · Statistics 2018-01-24 Can Karakus , Yifan Sun , Suhas Diggavi , Wotao Yin

The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The…

Information Theory · Computer Science 2020-02-11 Aditya Ramamoorthy , Anindya Bijoy Das , Li Tang

Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Vipul Gupta , Dominic Carrano , Yaoqing Yang , Vaishaal Shankar , Thomas Courtade , Kannan Ramchandran

Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum…

Information Theory · Computer Science 2023-08-24 Anindya Bijoy Das , Aditya Ramamoorthy , David J. Love , Christopher G. Brinton

In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-21 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme…

Information Theory · Computer Science 2016-10-26 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-08 Mohammad Mohammadi Amiri , Deniz Gunduz

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…

Information Theory · Computer Science 2022-12-19 Luis Maßny , Christoph Hofmeister , Maximilian Egger , Rawad Bitar , Antonia Wachter-Zeh

Matrix computations are a fundamental building-block of edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributing matrix computations…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-12 Anindya Bijoy Das , Aditya Ramamoorthy , David J. Love , Christopher G. Brinton

This paper introduces REDC, a comprehensive strategy for offloading computational tasks within mobile Edge Networks (EN) to Distributed Computing (DC) after Rateless Encoding (RE). Despite the efficiency, reliability, and scalability…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-24 Zhongfu Guo , Xinsheng Ji , Wei You , Yu Zhao , Bai Yi , Lingwei Wang

Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is…

Machine Learning · Statistics 2018-03-15 Can Karakus , Yifan Sun , Suhas Diggavi , Wotao Yin

Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…

Information Theory · Computer Science 2018-06-28 Nuwan Ferdinand , Stark Draper

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…

Information Theory · Computer Science 2017-07-04 Amirhossein Reisizadeh , Ramtin Pedarsani
‹ Prev 1 2 3 10 Next ›