Related papers: Timely-Throughput Optimal Coded Computing over Clo…

Leveraging Coding Techniques for Speeding up Distributed Computing

Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The…

Information Theory · Computer Science 2018-02-12 Konstantinos Konstantinidis , Aditya Ramamoorthy

Coded Computation over Heterogeneous Clusters

In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-21 Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Amir Salman Avestimehr

Hierarchical Coded Matrix Multiplication

In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date…

Information Theory · Computer Science 2021-02-02 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Slack Squeeze Coded Computing for Adaptive Straggler Mitigation

While performing distributed computations in today's cloud-based platforms, execution speed variations among compute nodes can significantly reduce the performance and create bottlenecks like stragglers. Coded computation techniques…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Krishna Giri Narra , Zhifeng Lin , Mehrdad Kiamari , Salman Avestimehr , Murali Annavaram

A New Design Framework for Heterogeneous Uncoded Storage Elastic Computing

Elasticity is one important feature in modern cloud computing systems and can result in computation failure or significantly increase computing time. Such elasticity means that virtual machines over the cloud can be preempted under a short…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-26 Mingyue Ji , Xiang Zhang , Kai Wan

Hierarchical coded elastic computing

Elasticity is offered by cloud service providers to exploit under-utilized computing resources. The low-cost elastic nodes can leave and join any time during the computation cycle. The possibility of elastic events occurring together with…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-24 Shahrzad Kiani , Tharindu Adikari , Stark C. Draper

Hierarchical Coded Matrix Multiplication

Slow working nodes, known as stragglers, can greatly reduce the speed of distributed computation. Coded matrix multiplication is a recently introduced technique that enables straggler-resistant distributed multiplication of large matrices.…

Information Theory · Computer Science 2019-07-23 Shahrzad Kiani , Nuwan Ferdinand , Stark C. Draper

Delay-Optimal Computation Task Scheduling for Mobile-Edge Computing Systems

Mobile-edge computing (MEC) emerges as a promising paradigm to improve the quality of computation experience for mobile devices. Nevertheless, the design of computation task scheduling policies for MEC systems inevitably encounters a…

Information Theory · Computer Science 2016-11-15 Juan Liu , Yuyi Mao , Jun Zhang , Khaled B. Letaief

A Unified Coding Framework for Distributed Computing with Straggling Servers

We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme…

Information Theory · Computer Science 2016-10-26 Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a…

Machine Learning · Computer Science 2026-03-26 Parsa Moradi , Behrooz Tahmasebi , Mohammad Ali Maddah-Ali

Resource allocation and routing in parallel multi-server queues with abandonments for cloud profit maximization

This paper considers a Markov decision model for profit maximization of a cloud computing service provider catering to customers submitting jobs with firm real-time random deadlines. Customers are charged on a per-job basis, receiving a…

Optimization and Control · Mathematics 2021-04-27 José Niño-Mora

Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads

Distributed cloud environments hosting data-intensive applications often experience slowdowns due to network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by traditional host-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Sankalpa Timilsina , Susmit Shannigrahi

Transition Waste Optimization for Coded Elastic Computing

Distributed computing, in which a resource-intensive task is divided into subtasks and distributed among different machines, plays a key role in solving large-scale problems. Coded computing is a recently emerging paradigm where redundancy…

Information Theory · Computer Science 2023-03-15 Hoang Dau , Ryan Gabrys , Yu-Chih Huang , Chen Feng , Quang-Hung Luu , Eidah Alzahrani , Zahir Tari

Timely-Throughput Optimal Scheduling with Prediction

Motivated by the increasing importance of providing delay-guaranteed services in general computing and communication systems, and the recent wide adoption of learning and prediction in network control, in this work, we consider a general…

Networking and Internet Architecture · Computer Science 2018-01-08 Kun Chen , Longbo Huang

On Throughput-Delay Optimal Access to Storage Clouds via Load Adaptive Coding and Chunking

Recent literature including our past work provide analysis and solutions for using (i) erasure coding, (ii) parallelism, or (iii) variable slicing/chunking (i.e., dividing an object of a specific size into a variable number of smaller…

Networking and Internet Architecture · Computer Science 2014-03-21 Guanfeng Liang , Ulas C. Kozat

Computation Scheduling for Distributed Machine Learning with Straggling Workers

We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-08 Mohammad Mohammadi Amiri , Deniz Gunduz

Latency Analysis of Coded Computation Schemes over Wireless Networks

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling…

Information Theory · Computer Science 2017-07-04 Amirhossein Reisizadeh , Ramtin Pedarsani

How to Optimally Allocate Resources for Coded Distributed Computing?

Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…

Information Theory · Computer Science 2017-02-24 Qian Yu , Songze Li , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-29 Homa Esfahanizadeh , Alejandro Cohen , Muriel Medard

Coded Computing and Cooperative Transmission for Wireless Distributed Matrix Multiplication

Consider a multi-cell mobile edge computing network, in which each user wishes to compute the product of a user-generated data matrix with a network-stored matrix. This is done through task offloading by means of input uploading,…

Information Theory · Computer Science 2021-01-19 Kuikui Li , Meixia Tao , Jingjing Zhang , Osvaldo Simeone