Parallelization of a treecode
Abstract
I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture using the MPI message-passing library. For a configuration with a constant number of particles per processor the scalability of the code was tested up to P=128 processors on an IBM SP4 machine. In the large limit the average CPU time per processor necessary for solving the gravitational interactions is higher than that expected from the ideal scaling relation. The processor domains are determined every large timestep according to a recursive orthogonal bisection, using a weighting scheme which takes into account the total particle computational load within the timestep. The results of the numerical tests show that the load balancing efficiency of the code is high () up to P=32, and decreases to when P=128. In the latter case it is found that some aspects of the code performance are affected by machine hardware, while the proposed weighting scheme can achieve a load balance as high as even in the large limit.
Cite
@article{arxiv.astro-ph/0303413,
title = {Parallelization of a treecode},
author = {R. Valdarnini},
journal= {arXiv preprint arXiv:astro-ph/0303413},
year = {2009}
}
Comments
30 pages, 3 tables, 9 figures, accepted for publication in New Astronomy