English

Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

Distributed, Parallel, and Cluster Computing 2020-11-19 v2

Abstract

We present Task Bench, a parameterized benchmark designed to explore the performance of parallel and distributed programming systems under a variety of application scenarios. Task Bench lowers the barrier to benchmarking multiple programming systems by making the implementation for a given system orthogonal to the benchmarks themselves: every benchmark constructed with Task Bench runs on every Task Bench implementation. Furthermore, Task Bench's parameterization enables a wide variety of benchmark scenarios that distill the key characteristics of larger applications. We conduct a comprehensive study with implementations of Task Bench in 15 programming systems on up to 256 Haswell nodes of the Cori supercomputer. We introduce a novel metric, minimum effective task granularity to study the baseline runtime overhead of each system. We show that when running at scale, 100 {\mu}s is the smallest granularity that even the most efficient systems can reliably support with current technologies. We also study each system's scalability, ability to hide communication and mitigate load imbalance.

Keywords

Cite

@article{arxiv.1908.05790,
  title  = {Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance},
  author = {Elliott Slaughter and Wei Wu and Yuankun Fu and Legend Brandenburg and Nicolai Garcia and Wilhem Kautz and Emily Marx and Kaleb S. Morris and Wonchan Lee and Qinglei Cao and George Bosilca and Seema Mirchandaney and Sean Treichler and Patrick McCormick and Alex Aiken},
  journal= {arXiv preprint arXiv:1908.05790},
  year   = {2020}
}

Comments

14 pages, 13 figures, published in SC'20: Proceedings of the Conference for High Performance Computing, Networking, Storage and Analysis