Related papers: Data-Driven Execution of Fast Multipole Methods
We present efficient algorithms to build data structures and the lists needed for fast multipole methods. The algorithms are capable of being efficiently implemented on both serial, data parallel GPU and on distributed architectures. With…
The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the…
Parallel real-time embedded applications can be modelled as directed acyclic graphs (DAGs) whose nodes model subtasks and whose edges model precedence constraints among subtasks. Efficiently scheduling such parallel tasks can be challenging…
Fast multipole methods (FMM) on distributed mem- ory have traditionally used a bulk-synchronous model of com- municating the local essential tree (LET) and overlapping it with computation of the local data. This could be perceived as an…
Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this last class belongs the well-known fast multipole method (FMM),…
Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the…
We discuss an implementation of adaptive fast multipole methods targeting hybrid multicore CPU- and GPU-systems. From previous experiences with the computational profile of our version of the fast multipole algorithm, suitable parts are…
Exascale systems are predicted to have approximately one billion cores, assuming Gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the current parallel…
The fast multipole method (FMM) performs fast approximate kernel summation to a specified tolerance $\epsilon$ by using a hierarchical division of the domain, which groups source and receiver points into regions that satisfy local…
The Fast Multipole Method (FMM) offers an acceleration for pairwise interaction calculation, known as $N$-body problems, from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ with $N$ particles. This has brought dramatic increase in the capability of…
Nowadays, multiprocessing is mainstream with exponentially increasing number of processors. Load balancing is, therefore, a critical operation for the efficient execution of parallel algorithms. In this paper we consider the fundamental…
The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing workload on higher levels of the FMM tree [Greengard and Gropp, Comp. Math. Appl., 20(7), 1990]. We show that this potential bottleneck can be…
This paper introduces a parallel directional fast multipole method (FMM) for solving N-body problems with highly oscillatory kernels, with a focus on the Helmholtz kernel in three dimensions. This class of oscillatory kernels requires a…
We present a number of novel algorithms, based on mathematical optimization formulations, in order to solve a homogeneous multiprocessor scheduling problem, while minimizing the total energy consumption. In particular, for a system with a…
While fast multipole methods (FMMs) are in widespread use for the rapid evaluation of potential fields governed by the Laplace, Helmholtz, Maxwell or Stokes equations, their coupling to high-order quadratures for evaluating layer potentials…
The Fast Multipole Method (FMM) is an efficient numerical algorithm for computation of long-ranged forces in $N$-body problems within gravitational and electrostatic fields. This method utilizes multipole expansions of the Green's function…
This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…
Current approaches to scheduling workloads on heterogeneous systems with specialized accelerators often rely on manual partitioning, offloading tasks with specific compute patterns to accelerators. This method requires extensive…
For the parallel computation of partial differential equations, one key is the grid partitioning. It requires that each process owns the same amount of computations, and also, the partitioning quality should be proper to reduce the…
Due to the variety and importance of applications of treecodes and FMM, the combination of algorithmic acceleration with hardware acceleration can have tremendous impact. Alas, programming these algorithms efficiently is no piece of cake.…