Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori
Abstract
We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary -dimensional tori effectively in MPI. Given a factorization of the number of processes into factors that can be mapped onto a -dimensional torus, we first utilize a Cartesian communicator to split a given -process MPI communicator into, for each MPI process, smaller communicators spanning each of the dimensions of the torus to which the process belongs, and cache these communicators in order to avoid expensive splitting at each all-to-all operation. The all-to-all operation itself is decomposed into a sequence of MPI_Alltoall operations on the dimension-wise communicators. The non-trivial data rearrangement before and after each MPI_Alltoall call is implicit only and effected by MPI derived datatypes. This makes the implementation of the algorithm formally \emph{zero-copy}, meaning that no explicit process-local reordering of data blocks ever has to be performed. In order to achieve this, the algorithm employs a double-buffering scheme with modest temporary buffer requirements. By choosing the factorization of and selecting appropriate implementations for the component MPI_Alltoall operations, the presented implementation gives ample opportunities for algorithm tuning and adaptation to the particular high-performance system. A few, select experimental results show competitive performance with native MPI_Alltoall implementations and illustrate problems that common MPI_Alltoall implementations may have.
Cite
@article{arxiv.2605.29970,
title = {Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori},
author = {Jesper Larsson Träff},
journal= {arXiv preprint arXiv:2605.29970},
year = {2026}
}