English
Related papers

Related papers: Coded Computation over Heterogeneous Clusters

200 papers

In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…

Information Theory · Computer Science 2021-02-02 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Slow working nodes, known as stragglers, can greatly reduce the speed of distributed computation. Coded matrix multiplication is a recently introduced technique that enables straggler-resistant distributed multiplication of large matrices.…

Information Theory · Computer Science 2019-07-23 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Recently, coding has been a useful technique to mitigate the effect of stragglers in distributed computing. However, coding in this context has been mainly explored under the assumption of homogeneous workers, although the real-world…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-18 DaeJin Kim , Hyegyeong Park , Junkyun Choi

We consider the recently proposed Coded Distributed Computing (CDC) framework that leverages carefully designed redundant computations to enable coding opportunities that substantially reduce the communication load of distributed computing.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-04 Mehrdad Kiamari , Chenwei Wang , A. Salman Avestimehr

Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…

Information Theory · Computer Science 2019-05-21 Yuxuan Sun , Junlin Zhao , Sheng Zhou , Deniz Gündüz

Our extensive real measurements over Amazon EC2 show that the virtual instances often have different computing speeds even if they share the same configurations. This motivates us to study heterogeneous Coded Storage Elastic Computing…

Information Theory · Computer Science 2021-09-17 Nicholas Woolsey , Joerg Kliewer , Rong-Rong Chen , Mingyue Ji

Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems there are several types of noise that can affect the performance of distributed machine learning algorithms -- straggler nodes,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-30 Kangwook Lee , Maximilian Lam , Ramtin Pedarsani , Dimitris Papailiopoulos , Kannan Ramchandran

In recent years, coded distributed computing (CDC) has attracted significant attention, because it can efficiently facilitate many delay-sensitive computation tasks against unexpected latencies in distributed computing systems. Despite such…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-12 Baoqian Wang , Junfei Xie , Kejie Lu , Yan Wan , Shengli Fu

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-29 Homa Esfahanizadeh , Alejandro Cohen , Muriel Medard

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-20 Yuxuan Sun , Fan Zhang , Junlin Zhao , Sheng Zhou , Zhisheng Niu , Deniz Gündüz

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…

Information Theory · Computer Science 2021-03-03 Alejandro Cohen , Guillaume Thiran , Homa Esfahanizadeh , Muriel Médard

Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The…

Information Theory · Computer Science 2018-02-12 Konstantinos Konstantinidis , Aditya Ramamoorthy

Large-scale machine learning and data mining applications require computer systems to perform massive matrix-vector and matrix-matrix multiplication operations that need to be parallelized across multiple nodes. The presence of straggling…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-01 Ankur Mallick , Malhar Chaudhari , Utsav Sheth , Ganesh Palanikumar , Gauri Joshi

In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-12 Chien-Sheng Yang , Ramtin Pedarsani , A. Salman Avestimehr

While performing distributed computations in today's cloud-based platforms, execution speed variations among compute nodes can significantly reduce the performance and create bottlenecks like stragglers. Coded computation techniques…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Krishna Giri Narra , Zhifeng Lin , Mehrdad Kiamari , Salman Avestimehr , Murali Annavaram

Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is…

Machine Learning · Statistics 2018-03-15 Can Karakus , Yifan Sun , Suhas Diggavi , Wotao Yin

Elasticity is offered by cloud service providers to exploit under-utilized computing resources. The low-cost elastic nodes can leave and join any time during the computation cycle. The possibility of elastic events occurring together with…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-24 Shahrzad Kiani , Tharindu Adikari , Stark C. Draper

In distributed machine learning (DML), the training data is distributed across multiple worker nodes to perform the underlying training in parallel. One major problem affecting the performance of DML algorithms is presence of stragglers.…

Information Theory · Computer Science 2021-05-14 Amogh Johri , Arti Yardi , Tejas Bodas

Edge computing is emerging as a new paradigm to allow processing data at the edge of the network, where data is typically generated and collected, by exploiting multiple devices at the edge collectively. However, exploiting the potential of…

Information Theory · Computer Science 2021-06-17 Elahe Vedadi , Hulya Seferoglu

Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Haozhao Wang , Song Guo , Bin Tang , Ruixuan Li , Chengjie Li
‹ Prev 1 2 3 10 Next ›