Related papers: Bonsai: A GPU Tree-Code
We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in…
We compare the performance of two very different parallel gravitational $N$-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales - NBODY6++ and Bonsai. We carry…
We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force…
We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented…
We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and…
I describe here the performances of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture…
Modeling of collisionless galactic systems is based on the N-body model, which requires large computational resources due to the long-range nature of gravitational forces. The most common method for calculating gravity is the TreeCode…
To assess how future progress in gravitational microlensing computation at high optical depth will rely on both hardware and software solutions, we compare a direct inverse ray-shooting code implemented on a graphics processing unit (GPU)…
Graphics Processing Units (GPUs) can speed up the numerical solution of various problems in astrophysics including the dynamical evolution of stellar systems; the performance gain can be more than a factor 100 compared to using a Central…
Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering,…
We propose a hybrid tree algorithm for reducing calculation and communication cost of collision-less N-body simulations. The concept of our algorithm is that we split interaction force into two parts: hard-force from neighbor particles and…
The tree method is a widely implemented algorithm for collisionless $N$-body simulations in astrophysics well suited for GPU(s). Adopting hierarchical time stepping can accelerate $N$-body simulations; however, it is infrequently…
We present the smoothed-particle hydrodynamics simulation code, Bonsai-SPH, which is a continuation of our previously developed gravity-only hierarchical $N$-body code (called Bonsai). The code is optimized for Graphics Processing Unit…
We present the results of gravitational direct $N$-body simulations using the commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce 8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The force…
We describe the use of Graphics Processing Units (GPUs) for speeding up the code NBODY6 which is widely used for direct $N$-body simulations. Over the years, the $N^2$ nature of the direct force calculation has proved a barrier for…
We present an algorithm named "Chamomile Scheme". The scheme is fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has (a) small but fast shared…
I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture…
Gravitational $N$-body simulations calculate numerous interactions between particles. The tree algorithm reduces these calculations by constructing a hierarchical oct-tree structure and approximating gravitational forces on particles. Over…
In this study, the gravitational octree code originally optimized for the Fermi, Kepler, and Maxwell GPU architectures is adapted to the Volta architecture. The Volta architecture introduces independent thread scheduling requiring either…
Direct-summation N-body algorithms compute the gravitational interaction between stars in an exact way and have a computational complexity of O(N^2). Performance can be greatly enhanced via the use of special-purpose accelerator boards like…