Revisiting Temporal Blocking Stencil Optimizations

Lingqi Zhang; Mohamed Wahib; Peng Chen; Jintao Meng; Xiao Wang; Toshio Endo; Satoshi Matsuoka

doi:10.1145/3577193.3593716

Revisiting Temporal Blocking Stencil Optimizations

Distributed, Parallel, and Cluster Computing 2023-05-15 v1

Authors: Lingqi Zhang , Mohamed Wahib , Peng Chen , Jintao Meng , Xiao Wang , Toshio Endo , Satoshi Matsuoka

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the data locality, temporal blocking is an optimization that combines a batch of time steps to process them together. Under the observation that GPUs are evolving to resemble CPUs in some aspects, we revisit temporal blocking optimizations for GPUs. We explore how temporal blocking schemes can be adapted to the new features in the recent Nvidia GPUs, including large scratchpad memory, hardware prefetching, and device-wide synchronization. We propose a novel temporal blocking method, EBISU, which champions low device occupancy to drive aggressive deep temporal blocking on large tiles that are executed tile-by-tile. We compare EBISU with state-of-the-art temporal blocking libraries: STENCILGEN and AN5D. We also compare with state-of-the-art stencil auto-tuning tools that are equipped with temporal blocking optimizations: ARTEMIS and DRSTENCIL. Over a wide range of stencil benchmarks, EBISU achieves speedups up to $2.53$ x and a geometric mean speedup of $1.49$ x over the best state-of-the-art performance in each stencil benchmark.

Keywords

gpu computing computer architecture fpga accelerator

Cite

@article{arxiv.2305.07390,
  title  = {Revisiting Temporal Blocking Stencil Optimizations},
  author = {Lingqi Zhang and Mohamed Wahib and Peng Chen and Jintao Meng and Xiao Wang and Toshio Endo and Satoshi Matsuoka},
  journal= {arXiv preprint arXiv:2305.07390},
  year   = {2023}
}

Comments

This paper will be published in 2023 International Conference on Supercomputing (ICS23)

Revisiting Temporal Blocking Stencil Optimizations

Abstract

Keywords

Cite

Comments

Related papers