Scheduling Coflows with Dependency Graph
Abstract
Applications in data-parallel computing typically consist of multiple stages. In each stage, a set of intermediate parallel data flows (Coflow) is produced and transferred between servers to enable starting of next stage. While there has been much research on scheduling isolated coflows, the dependency between coflows in multi-stage jobs has been largely ignored. In this paper, we consider scheduling coflows of multi-stage jobs represented by general DAGs (Directed Acyclic Graphs) in a shared data center network, so as to minimize the total weighted completion time of jobs. This problem is significantly more challenging than the traditional coflow scheduling, as scheduling even a single multi-stage job to minimize its completion time is shown to be NP-hard. In this paper, we propose a polynomial-time algorithm with approximation ratio of , where is the maximum number of coflows in a job and is the number of servers. For the special case that the jobs' underlying dependency graphs are rooted trees, we modify the algorithm and improve its approximation ratio. To verify the performance of our algorithms, we present simulation results using real traffic traces that show up to improvement over the prior approach. We conclude the paper by providing a result concerning an optimality gap for scheduling coflows with general DAGs.
Keywords
Cite
@article{arxiv.2012.11702,
title = {Scheduling Coflows with Dependency Graph},
author = {Mehrnoosh Shafiee and Javad Ghaderi},
journal= {arXiv preprint arXiv:2012.11702},
year = {2020}
}