English
Related papers

Related papers: Hierarchical Coded Matrix Multiplication

200 papers

In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…

Information Theory · Computer Science 2021-02-02 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Coded matrix multiplication is a technique to enable straggler-resistant multiplication of large matrices in distributed computing systems. In this paper, we first present a conceptual framework to represent the division of work amongst…

Information Theory · Computer Science 2019-07-23 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of…

Information Theory · Computer Science 2018-06-28 Nuwan Ferdinand , Stark Draper

In cloud computing systems slow processing nodes, often referred to as "stragglers", can significantly extend the computation time. Recent results have shown that error correction coding can be used to reduce the effect of stragglers. In…

Information Theory · Computer Science 2018-06-28 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

The overall execution time of distributed matrix computations is often dominated by slow worker nodes (stragglers) within the clusters. Recently, different coding techniques have been utilized to mitigate the effect of stragglers where…

Information Theory · Computer Science 2022-06-28 Anindya Bijoy Das , Aditya Ramamoorthy

Distributed matrix computations over large clusters can suffer from the problem of slow or failed worker nodes (called stragglers) which can dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to…

Information Theory · Computer Science 2021-09-27 Anindya Bijoy Das , Aditya Ramamoorthy

Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum…

Information Theory · Computer Science 2023-08-24 Anindya Bijoy Das , Aditya Ramamoorthy , David J. Love , Christopher G. Brinton

Elasticity is offered by cloud service providers to exploit under-utilized computing resources. The low-cost elastic nodes can leave and join any time during the computation cycle. The possibility of elastic events occurring together with…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-24 Shahrzad Kiani , Tharindu Adikari , Stark C. Draper

We consider the problem of stragglers in distributed computing systems. Stragglers, which are compute nodes that unpredictably slow down, often increase the completion times of tasks. One common approach to mitigating stragglers is work…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-07 Tharindu Adikari , Haider Al-Lawati , Jason Lam , Zhenhua Hu , Stark C. Draper

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-20 Yuxuan Sun , Fan Zhang , Junlin Zhao , Sheng Zhou , Zhisheng Niu , Deniz Gündüz

Distributed matrix computations -- matrix-matrix or matrix-vector multiplications -- are well-recognized to suffer from the problem of stragglers (slow or failed worker nodes). Much of prior work in this area is (i) either sub-optimal in…

Information Theory · Computer Science 2020-06-03 Anindya B. Das , Aditya Ramamoorthy , Namrata Vaswani

The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The…

Information Theory · Computer Science 2020-02-11 Aditya Ramamoorthy , Anindya Bijoy Das , Li Tang

In distributed computing systems, it is well recognized that worker nodes that are slow (called stragglers) tend to dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to mitigate the effect of…

Information Theory · Computer Science 2018-09-18 Anindya B. Das , Li Tang , Aditya Ramamoorthy

Distributed matrix multiplication is widely used in several scientific domains. It is well recognized that computation times on distributed clusters are often dominated by the slowest workers (called stragglers). Recent work has…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-08 Li Tang , Konstantinos Konstantinidis , Aditya Ramamoorthy

Matrix computations are a fundamental building-block of edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributing matrix computations…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-12 Anindya Bijoy Das , Aditya Ramamoorthy , David J. Love , Christopher G. Brinton

Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional…

Networking and Internet Architecture · Computer Science 2024-06-18 Weiheng Tang , Jingyi Li , Lin Chen , Xu Chen

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…

Information Theory · Computer Science 2021-03-03 Alejandro Cohen , Guillaume Thiran , Homa Esfahanizadeh , Muriel Médard

We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which…

Information Theory · Computer Science 2020-04-10 Qian Yu , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-21 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar
‹ Prev 1 2 3 10 Next ›