Related papers: Parallel Algorithms for Constructing Data Structur…

Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

Pipelining the Fast Multipole Method over a Runtime System

Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-06-04 Emmanuel Agullo , Béranger Bramas , Olivier Coulaud , Eric Darve , Matthias Messner , Takahashi Toru

Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU & GPU systems

We discuss an implementation of adaptive fast multipole methods targeting hybrid multicore CPU- and GPU-systems. From previous experiences with the computational profile of our version of the fast multipole algorithm, suitable parts are…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-09-03 Marcus Holm , Stefan Engblom , Anders Goude , Sverker Holmgren

A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems

Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the…

Numerical Analysis · Computer Science 2012-10-30 Rio Yokota , Lorena Barba

Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms

Traditional heterogeneous parallel algorithms, designed for heterogeneous clusters of workstations, are based on the assumption that the absolute speed of the processors does not depend on the size of the computational task. This assumption…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-15 Alexey Lastovetsky , Ravi Reddy , Vladimir Rychkov , David Clarke

On the Design and Analysis of Parallel and Distributed Algorithms

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-13 Rajendra Purohit , K R Chowdhary , S D Purohit

Fully parallel algorithm for simulating dispersion-managed wavelength-division-multiplexed optical fiber systems

An efficient numerical algorithm is presented for massively parallel simulations of dispersion-managed wavelength-division-multiplexed optical fiber systems. The algorithm is based on a weak nonlinearity approximation and independent…

Pattern Formation and Solitons · Physics 2009-11-07 P. M. Lushnikov

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Parallel Computing Architectures for Robotic Applications: A Comprehensive Review

With the growing complexity and capability of contemporary robotic systems, the necessity of sophisticated computing solutions to efficiently handle tasks such as real-time processing, sensor integration, decision-making, and control…

Robotics · Computer Science 2025-09-09 Md Rafid Islam

Data-Driven Execution of Fast Multipole Methods

Fast multipole methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. Their most common application is to accelerate N-body…

Numerical Analysis · Computer Science 2012-03-06 Hatem Ltaief , Rio Yokota

Parallel simulations for Fractional-Order Systems

In this paper, we explore how numerical calculations can be accelerated by implementing several numerical methods of fractional-order systems using parallel computing techniques. We investigate the feasibility of parallel computing…

Dynamical Systems · Mathematics 2016-11-29 A. Baban , C. Bonchiş , A. Fikl , F. Roşu

Massively Parallel Construction of Radix Tree Forests for the Efficient Sampling of Discrete Probability Distributions

We compare different methods for sampling from discrete probability distributions and introduce a new algorithm which is especially efficient on massively parallel processors, such as GPUs. The scheme preserves the distribution properties…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-02 Nikolaus Binder , Alexander Keller

Adaptive fast multipole methods on the GPU

We present a highly general implementation of fast multipole methods on graphics processing units (GPUs). Our two-dimensional double precision code features an asymmetric type of adaptive space discretization leading to a particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-22 Anders Goude , Stefan Engblom

Parareal Neural Networks Emulating a Parallel-in-time Algorithm

As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this paper, we introduce a novel methodology to…

Numerical Analysis · Mathematics 2024-07-08 Chang-Ock Lee , Youngkyu Lee , Jongho Park

Massively Parallel Construction of the Cell Graph

Motion planning is an important and well-studied field of robotics. A typical approach to finding a route is to construct a {\em cell graph} representing a scene and then to find a path in such a graph. In this paper we present and analyze…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-23 Krzysztof Kaczmarski , Paweł Rzążewski , Albert Wolant

Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Mehmet Deveci , Christian Trott , Sivasankaran Rajamanickam

Treecode and fast multipole method for N-body simulation with CUDA

Due to the variety and importance of applications of treecodes and FMM, the combination of algorithmic acceleration with hardware acceleration can have tremendous impact. Alas, programming these algorithms efficiently is no piece of cake.…

Computational Physics · Physics 2012-08-14 Rio Yokota , Lorena Barba

Fast Simulation of Multicomponent Dynamic Systems

A computer simulation has to be fast to be helpful, if it is employed to study the behavior of a multicomponent dynamic system. This paper discusses modeling concepts and algorithmic techniques useful for creating such fast simulations.…

Data Structures and Algorithms · Computer Science 2007-05-23 Boris D. Lubachevsky

A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms

Exascale systems are predicted to have approximately one billion cores, assuming Gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the current parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-27 Huda Ibeid , Rio Yokota , David Keyes

Building An Efficient Grid On GPU

Grid space partitioning is a technique to speed up queries to graphics databases. We present a parallel grid construction algorithm which can efficiently construct a structured grid on GPU hardware. Our approach is substantially faster than…

Graphics · Computer Science 2024-03-19 Vasco Costa , João M. Pereira , Joaquim Jorge