Related papers: Gravitational octree code performance evaluation o…

Gravitational tree-code on graphics processing units: implementation in CUDA

We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force…

Instrumentation and Methods for Astrophysics · Physics 2010-10-15 Evghenii Gaburov , Jeroen Bédorf , Simon Portegies Zwart

A sparse octree gravitational N-body code that runs entirely on the GPU processor

We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in…

Instrumentation and Methods for Astrophysics · Physics 2012-04-11 Jeroen Bédorf , Evghenii Gaburov , Simon Portegies Zwart

GOTHIC: Gravitational oct-tree code accelerated by hierarchical time step controlling

The tree method is a widely implemented algorithm for collisionless $N$-body simulations in astrophysics well suited for GPU(s). Adopting hierarchical time stepping can accelerate $N$-body simulations; however, it is infrequently…

Instrumentation and Methods for Astrophysics · Physics 2016-11-09 Yohei Miki , Masayuki Umemura

NVIDIA Tensor Core Programmability, Performance & Precision

The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-18 Stefano Markidis , Steven Wei Der Chien , Erwin Laure , Ivy Bo Peng , Jeffrey S. Vetter

Three Dimensional Pseudo-Spectral Compressible Magnetohydrodynamic GPU Code for Astrophysical Plasma Simulation

This paper presents the benchmarking and scaling studies of a GPU accelerated three dimensional compressible magnetohydrodynamic code. The code is developed keeping an eye to explain the large and intermediate scale magnetic field…

Computational Physics · Physics 2019-01-25 Rupak Mukherjee , Rajaraman Ganesh , Vinod Saini , Udaya Maurya , Nagavijayalakshmi Vydyanathan , Bharatkumar Sharma

New features of parallel implementation of N-body problems on GPU

This paper focuses on the parallel implementation of a direct $N$-body method~(particle-particle algorithm) and the application of multiple GPUs for galactic dynamics simulations. Application of a hybrid OpenMP-CUDA technology is considered…

Computational Physics · Physics 2018-03-06 S. S. Khrapov , S. A. Khoperskov , A. V. Khoperskov

Bonsai: A GPU Tree-Code

We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central…

Instrumentation and Methods for Astrophysics · Physics 2012-04-12 Jeroen Bédorf , Evghenii Gaburov , Simon Portegies Zwart

8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks

Performance optimization can be a daunting task especially as the hardware architecture becomes more and more complex. This paper takes a kernel from the Materials Science code BerkeleyGW, and demonstrates a few performance analysis and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-24 Charlene Yang

Performance analysis of parallel gravitational $N$-body codes on large GPU cluster

We compare the performance of two very different parallel gravitational $N$-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales - NBODY6++ and Bonsai. We carry…

Instrumentation and Methods for Astrophysics · Physics 2016-01-20 Siyi Huang , Rainer Spurzem , Peter Berczik

Efficiency of parallel computations of gravitational forces by TreeCode method in N-body models

Modeling of collisionless galactic systems is based on the N-body model, which requires large computational resources due to the long-range nature of gravitational forces. The most common method for calculating gravity is the TreeCode…

Computational Physics · Physics 2024-12-03 Nikolay M. Kuzmin , Danila S. Sirotin , Alexander V. Khoperskov

High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA

We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented…

Astrophysics · Physics 2008-11-26 Robert G. Belleman , Jeroen Bedorf , Simon Portegies Zwart

Oct-tree Method on GPU

The kd-tree is a fundamental tool in computer science. Among others, an application of the kd-tree search (oct-tree method) to fast evaluation of particle interactions and neighbor search is highly important since computational complexity…

Instrumentation and Methods for Astrophysics · Physics 2009-09-04 N. Nakasato

Fast quantum Monte Carlo on a GPU

We present a scheme for the parallelization of quantum Monte Carlo on graphical processing units, focusing on bosonic systems and variational Monte Carlo. We use asynchronous execution schemes with shared memory persistence, and obtain an…

Computational Physics · Physics 2014-12-10 Y. Lutsyshyn

High-Precision Numerical Simulations of Rotating Black Holes Accelerated by CUDA

Hardware accelerators (such as Nvidia's CUDA GPUs) have tremendous promise for computational science, because they can deliver large gains in performance at relatively low cost. In this work, we focus on the use of Nvidia's Tesla GPU for…

Computational Physics · Physics 2010-06-04 Rakesh Ginjupalli , Gaurav Khanna

Code Optimization on Kepler GPUs and Xeon Phi

Kepler GTX Titan Black and Kepler Tesla K40 are still the best GPUs for high performance computing, although Maxwell GPUs such as GTX 980 are available in the market. Hence, we measure the performance of our lattice QCD codes using the…

High Energy Physics - Lattice · Physics 2014-11-11 Yong-Chull Jang , Hwancheol Jeong , Jangho Kim , Weonjong Lee , Jeonghwan Pak , Yuree Chung

A GPU accelerated Barnes-Hut Tree Code for FLASH4

We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH) tree code for calculating the gravitational potential on octree adaptive meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh refinement…

Instrumentation and Methods for Astrophysics · Physics 2015-11-30 Gunther Lukat , Robi Banerjee

Performance characteristics of a parallel treecode

I describe here the performances of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture…

Astrophysics · Physics 2007-05-23 R. Valdarnini

VINE -- A numerical code for simulating astrophysical systems using particles II: Implementation and performance characteristics

We continue our presentation of VINE. We begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the…

Astrophysics · Physics 2009-10-02 Andrew F. Nelson , M. Wetzstein , T. Naab , .

GPU Support for Automatic Generation of Finite-Differences Stencil Kernels

The growth of data to be processed in the Oil & Gas industry matches the requirements imposed by evolving algorithms based on stencil computations, such as Full Waveform Inversion and Reverse Time Migration. Graphical processing units…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-05 Vitor Hugo Mickus Rodrigues , Lucas Cavalcante , Maelso Bruno Pereira , Fabio Luporini , István Reguly , Gerard Gorman , Samuel Xavier de Souza

Computational Gravitational Dynamics with Modern Numerical Accelerators

We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and…

Instrumentation and Methods for Astrophysics · Physics 2014-09-22 Simon Portegies Zwart , Jeroen Bédorf