English

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU

Hardware Architecture 2021-10-06 v1 Artificial Intelligence Machine Learning

Abstract

As AI-based applications become pervasive, CPU vendors are starting to incorporate matrix engines within the datapath to boost efficiency. Systolic arrays have been the premier architectural choice as matrix engines in offload accelerators. However, we demonstrate that incorporating them inside CPUs can introduce under-utilization and stalls due to limited register storage to amortize the fill and drain times of the array. To address this, we propose RASA, Register-Aware Systolic Array. We develop techniques to divide an execution stage into several sub-stages and overlap instructions to hide overheads and run them concurrently. RASA-based designs improve performance significantly with negligible area and power overhead.

Keywords

Cite

@article{arxiv.2110.01752,
  title  = {RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU},
  author = {Geonhwa Jeong and Eric Qin and Ananda Samajdar and Christopher J. Hughes and Sreenivas Subramoney and Hyesoon Kim and Tushar Krishna},
  journal= {arXiv preprint arXiv:2110.01752},
  year   = {2021}
}

Comments

This paper is accepted to DAC 2021