English
Related papers

Related papers: A Practical Algorithm Design and Evaluation for He…

200 papers

We study the optimal design of heterogeneous Coded Elastic Computing (CEC) where machines have varying computation speeds and storage. CEC introduced by Yang et al. in 2018 is a framework that mitigates the impact of elastic events, where…

Information Theory · Computer Science 2020-08-13 Nicholas Woolsey , Rong-Rong Chen , Mingyue Ji

Elasticity is one important feature in modern cloud computing systems and can result in computation failure or significantly increase computing time. Such elasticity means that virtual machines over the cloud can be preempted under a short…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-26 Mingyue Ji , Xiang Zhang , Kai Wan

We study the optimal design of a heterogeneous coded elastic computing (CEC) network where machines have varying relative computation speeds. CEC introduced by Yang {\it et al.} is a framework which mitigates the impact of elastic events,…

Information Theory · Computer Science 2020-01-14 Nicholas Woolsey , Rong-Rong Chen , Mingyue Ji

In 2018, Yang et al. introduced a novel and effective approach, using maximum distance separable (MDS) codes, to mitigate the impact of elasticity in cloud computing systems. This approach is referred to as coded elastic computing. Some…

Information Theory · Computer Science 2024-01-23 Xi Zhong , Joerg Kliewer , Mingyue Ji

Elasticity plays an important role in modern cloud computing systems. Elastic computing allows virtual machines (i.e., computing nodes) to be preempted when high-priority jobs arise, and also allows new virtual machines to participate in…

Information Theory · Computer Science 2024-03-04 Wenbo Huang , Xudong You , Kai Wan , Robert Caiming Qiu , Mingyue Ji

In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-21 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

Coded elastic computing enables virtual machines to be preempted for high-priority tasks while allowing new virtual machines to join ongoing computation seamlessly. This paper addresses coded elastic computing for matrix-matrix…

Information Theory · Computer Science 2025-01-30 Xi Zhong , Samuel Lu , Joerg Kliewer , Mingyue Ji

Elasticity is offered by cloud service providers to exploit under-utilized computing resources. The low-cost elastic nodes can leave and join any time during the computation cycle. The possibility of elastic events occurring together with…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-24 Shahrzad Kiani , Tharindu Adikari , Stark C. Draper

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines called 'stragglers', and…

Information Theory · Computer Science 2020-05-15 Swanand Kadhe , O. Ozan Koyluoglu , Kannan Ramchandran

Over the years, hardware trends have introduced various heterogeneous compute units while also bringing network and storage bandwidths within an order of magnitude of memory subsystems. In response, developers have used increasingly exotic…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-20 Aldrin Montana , Yuanqing Xue , Jeff LeFevre , Carlos Maltzahn , Josh Stuart , Philip Kufeldt , Peter Alvaro

Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Haozhao Wang , Song Guo , Bin Tang , Ruixuan Li , Chengjie Li

In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…

Information Theory · Computer Science 2021-02-02 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is…

Machine Learning · Statistics 2018-03-15 Can Karakus , Yifan Sun , Suhas Diggavi , Wotao Yin

While performing distributed computations in today's cloud-based platforms, execution speed variations among compute nodes can significantly reduce the performance and create bottlenecks like stragglers. Coded computation techniques…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Krishna Giri Narra , Zhifeng Lin , Mehrdad Kiamari , Salman Avestimehr , Murali Annavaram

Slow working nodes, known as stragglers, can greatly reduce the speed of distributed computation. Coded matrix multiplication is a recently introduced technique that enables straggler-resistant distributed multiplication of large matrices.…

Information Theory · Computer Science 2019-07-23 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designed for a fixed number of stragglers, we developed a new scheme called Adaptive Gradient Coding(AGC) with flexible…

Information Theory · Computer Science 2021-10-20 Hankun Cao , Qifa Yan , Xiaohu Tang , Guojun Han

We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-30 Albin Severinson , Eirik Rosnes , Salim El Rouayheb , Alexandre Graell i Amat

In distributed machine learning (DML), the training data is distributed across multiple worker nodes to perform the underlying training in parallel. One major problem affecting the performance of DML algorithms is presence of stragglers.…

Information Theory · Computer Science 2021-05-14 Amogh Johri , Arti Yardi , Tejas Bodas

Coded elastic computing, introduced by Yang et al. in 2018, is a technique designed to mitigate the impact of elasticity in cloud computing systems, where machines can be preempted or be added during computing rounds. This approach utilizes…

Information Theory · Computer Science 2025-02-03 Xi Zhong , Samuel Lu , Joerg Kliewer , Mingyue Ji

When gradient descent (GD) is scaled to many parallel workers for large scale machine learning problems, its per-iteration computation time is limited by the straggling workers. Straggling workers can be tolerated by assigning redundant…

Information Theory · Computer Science 2020-06-24 Emre Ozfatura , Sennur Ulukus , Deniz Gunduz
‹ Prev 1 2 3 10 Next ›