Homecs.DCarXiv:2605.29970

Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori

cs.DC2026-05v1license

Abstract

We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary dd-dimensional tori effectively in MPI. Given a factorization of the number of processes pp into dd factors that can be mapped onto a dd-dimensional torus, we first utilize a Cartesian communicator to split a given pp-process MPI communicator into, for each MPI process, dd smaller communicators spanning each of the dimensions of the torus to which the process belongs, and cache these communicators in order to avoid expensive splitting at each all-to-all operation. The all-to-all operation itself is decomposed into a sequence of dd MPI_Alltoall operations on the dimension-wise communicators. The non-trivial data rearrangement before and after each MPI_Alltoall call is implicit only and effected by MPI derived datatypes. This makes the implementation of the algorithm formally \emph{zero-copy}, meaning that no explicit process-local reordering of data blocks ever has to be performed. In order to achieve this, the algorithm employs a double-buffering scheme with modest temporary buffer requirements. By choosing the factorization of pp and selecting appropriate implementations for the component MPI_Alltoall operations, the presented implementation gives ample opportunities for algorithm tuning and adaptation to the particular high-performance system. A few, select experimental results show competitive performance with native MPI_Alltoall implementations and illustrate problems that common MPI_Alltoall implementations may have.

Cite

@article{arxiv.2605.29970,
  title  = {Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori},
  author = {Jesper Larsson Träff},
  journal= {arXiv preprint arXiv:2605.29970},
  year   = {2026}
}