Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori

Authors: Jesper Larsson Träff

cs.DC2026-05v1license

Abstract

We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary $d$ -dimensional tori effectively in MPI. Given a factorization of the number of processes $p$ into $d$ factors that can be mapped onto a $d$ -dimensional torus, we first utilize a Cartesian communicator to split a given $p$ -process MPI communicator into, for each MPI process, $d$ smaller communicators spanning each of the dimensions of the torus to which the process belongs, and cache these communicators in order to avoid expensive splitting at each all-to-all operation. The all-to-all operation itself is decomposed into a sequence of $d$ MPI_Alltoall operations on the dimension-wise communicators. The non-trivial data rearrangement before and after each MPI_Alltoall call is implicit only and effected by MPI derived datatypes. This makes the implementation of the algorithm formally \emph{zero-copy}, meaning that no explicit process-local reordering of data blocks ever has to be performed. In order to achieve this, the algorithm employs a double-buffering scheme with modest temporary buffer requirements. By choosing the factorization of $p$ and selecting appropriate implementations for the component MPI_Alltoall operations, the presented implementation gives ample opportunities for algorithm tuning and adaptation to the particular high-performance system. A few, select experimental results show competitive performance with native MPI_Alltoall implementations and illustrate problems that common MPI_Alltoall implementations may have.

Cite

@article{arxiv.2605.29970,
  title  = {Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori},
  author = {Jesper Larsson Träff},
  journal= {arXiv preprint arXiv:2605.29970},
  year   = {2026}
}

← cs.DC · Home