English
Related papers

Related papers: Optimal Load Allocation for Coded Distributed Comp…

200 papers

Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…

Information Theory · Computer Science 2019-05-21 Yuxuan Sun , Junlin Zhao , Sheng Zhou , Deniz Gündüz

Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-20 Yuxuan Sun , Fan Zhang , Junlin Zhao , Sheng Zhou , Zhisheng Niu , Deniz Gündüz

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…

Information Theory · Computer Science 2021-03-03 Alejandro Cohen , Guillaume Thiran , Homa Esfahanizadeh , Muriel Médard

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-29 Homa Esfahanizadeh , Alejandro Cohen , Muriel Medard

We consider the recently proposed Coded Distributed Computing (CDC) framework that leverages carefully designed redundant computations to enable coding opportunities that substantially reduce the communication load of distributed computing.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-04 Mehrdad Kiamari , Chenwei Wang , A. Salman Avestimehr

Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-29 Haozhao Wang , Song Guo , Bin Tang , Ruixuan Li , Chengjie Li

In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-21 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar

The problem of minimizing mean response time of generic jobs submitted to a heterogenous distributed computer systems is considered in this paper. A static load balancing strategy, in which decision of redistribution of loads does not…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-11-09 S. A. Mondal

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…

Information Theory · Computer Science 2022-12-19 Luis Maßny , Christoph Hofmeister , Maximilian Egger , Rawad Bitar , Antonia Wachter-Zeh

Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…

Information Theory · Computer Science 2017-02-24 Qian Yu , Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…

Information Theory · Computer Science 2017-07-04 Amirhossein Reisizadeh , Ramtin Pedarsani

Owing to data-intensive large-scale applications, distributed computation systems have gained significant recent interest, due to their ability of running such tasks over a large number of commodity nodes in a time efficient manner. One of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-23 Mohamed A. Attia , Ravi Tandon

Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Hyegyeong Park , Kangwook Lee , Jy-yong Sohn , Changho Suh , Jaekyun Moon

Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional…

Networking and Internet Architecture · Computer Science 2024-06-18 Weiheng Tang , Jingyi Li , Lin Chen , Xu Chen

We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme…

Information Theory · Computer Science 2016-10-26 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Streaming applications frequently encounter skewed workloads and execute on heterogeneous clusters. Optimal resource utilization in such adverse conditions becomes a challenge, as it requires inferring the resource capacities and input…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-03 Muhammad Anis Uddin Nasir , Hiroshi Horii , Marco Serafini , Nicolas Kourtellis , Rudy Raymond , Sarunas Girdzijauskas , Takayuki Osogami

Master-worker distributed computing systems use task replication in order to mitigate the effect of slow workers, known as stragglers. Tasks are grouped into batches and assigned to one or more workers for execution. We first consider the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-29 Amir Behrouzi-Far , Emina Soljanin

Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the…

Information Theory · Computer Science 2019-01-17 Muah Kim , Jy-yong Sohn , Jaekyun Moon

In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data…

Systems and Control · Electrical Eng. & Systems 2025-10-28 Heekang Song , Wan Choi
‹ Prev 1 2 3 10 Next ›