English

Practical Size-based Scheduling for MapReduce Workloads

Distributed, Parallel, and Cluster Computing 2013-05-06 v2

Abstract

We present the Hadoop Fair Sojourn Protocol (HFSP) scheduler, which implements a size-based scheduling discipline for Hadoop. The benefits of size-based scheduling disciplines are well recognized in a variety of contexts (computer networks, operating systems, etc...), yet, their practical implementation for a system such as Hadoop raises a number of important challenges. With HFSP, which is available as an open-source project, we address issues related to job size estimation, resource management and study the effects of a variety of preemption strategies. Although the architecture underlying HFSP is suitable for any size-based scheduling discipline, in this work we revisit and extend the Fair Sojourn Protocol, which solves problems related to job starvation that affect FIFO, Processor Sharing and a range of size-based disciplines. Our experiments, in which we compare HFSP to standard Hadoop schedulers, pinpoint at a significant decrease in average job sojourn times - a metric that accounts for the total time a job spends in the system, including waiting and serving times - for realistic workloads that we generate according to production traces available in literature.

Keywords

Cite

@article{arxiv.1302.2749,
  title  = {Practical Size-based Scheduling for MapReduce Workloads},
  author = {Mario Pastorelli and Antonio Barbuzzi and Damiano Carra and Matteo Dell'Amico and Pietro Michiardi},
  journal= {arXiv preprint arXiv:1302.2749},
  year   = {2013}
}

Comments

12 pages, 8 figures

R2 v1 2026-06-21T23:24:42.301Z