English

A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

Emerging Technologies 2024-08-12 v1 Artificial Intelligence

Abstract

Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenant DNNs. However, direct deployment of complex multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges. Resource contention among different tenants can result in sever under-utilization of on-chip computing resources. Moreover, area-intensive operators and computation-intensive operators require excessively large on-chip areas and long processing times, leading to high overall latency during parallel computing. To address these challenges, we propose a novel ReRAM-based in-memory computing framework that enables efficient deployment of multi-tenant DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention problems by iteratively partitioning the PIM hardware at tenant level. In addition, we construct a fine-grained reconstructed processing pipeline at the operator level to handle area-intensive operators. Compared to the direct deployments on traditional ReRAM-based PIM designs, our proposed PIM computing framework achieves significant improvements in speed (ranges from 1.75x to 60.43x) and energy(up to 1.89x).

Keywords

Cite

@article{arxiv.2408.04812,
  title  = {A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN},
  author = {Bojing Li and Duo Zhong and Xiang Chen and Chenchen Liu},
  journal= {arXiv preprint arXiv:2408.04812},
  year   = {2024}
}