Exploiting Parallelism Opportunities with Deep Learning Frameworks

Yu Emma Wang; Carole-Jean Wu; Xiaodong Wang; Kim Hazelwood; David Brooks

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Machine Learning 2020-07-01 v2 Distributed, Parallel, and Cluster Computing Performance Machine Learning

Authors: Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

Abstract

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively.

Keywords

parallel programming large language model inference parallel algorithm

Cite

@article{arxiv.1908.04705,
  title  = {Exploiting Parallelism Opportunities with Deep Learning Frameworks},
  author = {Yu Emma Wang and Carole-Jean Wu and Xiaodong Wang and Kim Hazelwood and David Brooks},
  journal= {arXiv preprint arXiv:1908.04705},
  year   = {2020}
}

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Abstract

Keywords

Cite

Related papers