English

Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing

High Energy Physics - Experiment 2023-10-31 v2 Distributed, Parallel, and Cluster Computing Data Analysis, Statistics and Probability

Abstract

We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics experiments. We process most of the dataset with the GPU version of our processing algorithm and the remainder with the CPU version for timing comparisons. We find that a 100-GPU cloud-based server is able to easily meet the processing demand, and that using the GPU version of the event processing algorithm is two times faster than processing these data with the CPU version when comparing to the newest CPUs in our sample. The amount of data transferred to the inference server during the GPU runs can overwhelm even the highest-bandwidth network switches, however, unless care is taken to observe network facility limits or otherwise distribute the jobs to multiple sites. We discuss the lessons learned from this processing campaign and several avenues for future improvements.

Keywords

Cite

@article{arxiv.2301.04633,
  title  = {Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing},
  author = {Tejin Cai and Kenneth Herner and Tingjun Yang and Michael Wang and Maria Acosta Flechas and Philip Harris and Burt Holzman and Kevin Pedro and Nhan Tran},
  journal= {arXiv preprint arXiv:2301.04633},
  year   = {2023}
}

Comments

13 pages, 9 figures, matches accepted version