English

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Machine Learning 2020-07-01 v2 Distributed, Parallel, and Cluster Computing Performance Machine Learning

Abstract

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively.

Keywords

Cite

@article{arxiv.1908.04705,
  title  = {Exploiting Parallelism Opportunities with Deep Learning Frameworks},
  author = {Yu Emma Wang and Carole-Jean Wu and Xiaodong Wang and Kim Hazelwood and David Brooks},
  journal= {arXiv preprint arXiv:1908.04705},
  year   = {2020}
}