English

Optimizing Memory-Access Patterns for Deep Learning Accelerators

Performance 2020-03-02 v1 Computation and Language

Abstract

Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.

Keywords

Cite

@article{arxiv.2002.12798,
  title  = {Optimizing Memory-Access Patterns for Deep Learning Accelerators},
  author = {Hongbin Zheng and Sejong Oh and Huiqing Wang and Preston Briggs and Jiading Gai and Animesh Jain and Yizhi Liu and Rich Heaton and Randy Huang and Yida Wang},
  journal= {arXiv preprint arXiv:2002.12798},
  year   = {2020}
}

Comments

Extended abstract for a poster presented at C4ML workshop 2020