English

Multi-Node Multi-GPU Diffeomorphic Image Registration for Large-Scale Imaging Problems

Distributed, Parallel, and Cluster Computing 2020-12-25 v1 Optimization and Control

Abstract

We present a Gauss-Newton-Krylov solver for large deformation diffeomorphic image registration. We extend the publicly available CLAIRE library to multi-node multi-graphics processing unit (GPUs) systems and introduce novel algorithmic modifications that significantly improve performance. Our contributions comprise (ii) a new preconditioner for the reduced-space Gauss-Newton Hessian system, (iiii) a highly-optimized multi-node multi-GPU implementation exploiting device direct communication for the main computational kernels (interpolation, high-order finite difference operators and Fast-Fourier-Transform), and (iiiiii) a comparison with state-of-the-art CPU and GPU implementations. We solve a 2563256^3-resolution image registration problem in five seconds on a single NVIDIA Tesla V100, with a performance speedup of 70% compared to the state-of-the-art. In our largest run, we register 204832048^3 resolution images (25 B unknowns; approximately 152×\times larger than the largest problem solved in state-of-the-art GPU implementations) on 64 nodes with 256 GPUs on TACC's Longhorn system.

Cite

@article{arxiv.2008.12820,
  title  = {Multi-Node Multi-GPU Diffeomorphic Image Registration for Large-Scale Imaging Problems},
  author = {Malte Brunn and Naveen Himthani and George Biros and Miriam Mehl and Andreas Mang},
  journal= {arXiv preprint arXiv:2008.12820},
  year   = {2020}
}

Comments

Proc ACM/IEEE Conference on Supercomputing 2020 (accepted for publication)

R2 v1 2026-06-23T18:10:25.136Z