Related papers: A GPU-Accelerated Barycentric Lagrange Treecode

A GPU-Accelerated Fast Summation Method Based on Barycentric Lagrange Interpolation and Dual Tree Traversal

We present the barycentric Lagrange dual tree traversal (BLDTT) fast summation method for particle interactions. The scheme replaces well-separated particle-particle interactions by adaptively chosen particle-cluster, cluster-particle, and…

Computational Physics · Physics 2021-06-02 Leighton Wilson , Nathan Vaughn , Robert Krasny

A kernel-independent treecode based on barycentric Lagrange interpolation

A kernel-independent treecode (KITC) is presented for fast summation of particle interactions. The method employs barycentric Lagrange interpolation at Chebyshev points to approximate well-separated particle-cluster interactions. The KITC…

Numerical Analysis · Mathematics 2021-11-24 Lei Wang , Robert Krasny , Svetlana Tlupova

Performance characteristics of a parallel treecode

I describe here the performances of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture…

Astrophysics · Physics 2007-05-23 R. Valdarnini

Parallelization of a treecode

I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture…

Astrophysics · Physics 2009-11-07 R. Valdarnini

Blink: Fast and Generic Collectives for Distributed ML

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale. Existing parameter synchronization protocols cannot effectively leverage available network resources in the face of ever increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-14 Guanhua Wang , Shivaram Venkataraman , Amar Phanishayee , Jorgen Thelin , Nikhil Devanur , Ion Stoica

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Our formulation reveals that the reduction across the sequence axis can be efficiently computed in parallel through a tree reduction. Our algorithm, called Tree Attention, for parallelizing exact attention computation across multiple GPUs…

Machine Learning · Computer Science 2025-02-11 Vasudev Shyam , Jonathan Pilault , Emily Shepperd , Quentin Anthony , Beren Millidge

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-01 Christian Feichtinger , Johannes Habich , Harald Koestler , Georg Hager , Ulrich Ruede , Gerhard Wellein

Cornerstone: Octree Construction Algorithms for Scalable Particle Simulations

This paper presents an octree construction method, called Cornerstone, that facilitates global domain decomposition and interactions between particles in mesh-free numerical simulations. Our method is based on algorithms developed for 3D…

Instrumentation and Methods for Astrophysics · Physics 2023-07-14 Sebastian Keller , Aurélien Cavelan , Rubén Cabezon , Lucio Mayer , Florina M. Ciorba

Effective GPU Parallelization of Distributed and Localized Model Predictive Control

To effectively control large-scale distributed systems online, model predictive control (MPC) has to swiftly solve the underlying high-dimensional optimization. There are multiple techniques applied to accelerate the solving process in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-30 Carmen Amo Alonso , Shih-Hao Tseng

Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI

We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates…

Fluid Dynamics · Physics 2022-11-21 Ao Xu , Bo-Tao Li

Parallelizing Maximal Clique Enumeration on GPUs

We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-25 Mohammad Almasri , Yen-Hsiang Chang , Izzat El Hajj , Rakesh Nagi , Jinjun Xiong , Wen-mei Hwu

Accelerating Monte-Carlo Tree Search on CPU-FPGA Heterogeneous Platform

Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-25 Yuan Meng , Rajgopal Kannan , Viktor Prasanna

GPU-acceleration for Large-scale Tree Boosting

In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests…

Machine Learning · Statistics 2017-06-27 Huan Zhang , Si Si , Cho-Jui Hsieh

Cross-platform programming model for many-core lattice Boltzmann simulations

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our…

Computational Physics · Physics 2021-05-11 Jonas Latt , Christophe Coreixas , Joël Beny

Particle-resolved thermal lattice Boltzmann simulation using OpenACC on multi-GPUs

We utilize the Open Accelerator (OpenACC) approach for graphics processing unit (GPU) accelerated particle-resolved thermal lattice Boltzmann (LB) simulation. We adopt the momentum-exchange method to calculate fluid-particle interactions to…

Fluid Dynamics · Physics 2023-10-06 Ao Xu , Bo-Tao Li

Single MCMC Chain Parallelisation on Decision Trees

Decision trees are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior…

Artificial Intelligence · Computer Science 2022-07-27 Efthyvoulos Drousiotis , Paul G. Spirakis

Fast Merge Tree Computation via SYCL

A merge tree is a topological descriptor of a real-valued function. Merge trees are used in visualization and topological data analysis, either directly or as a means to another end: computing a 0-dimensional persistence diagram,…

Computational Geometry · Computer Science 2023-01-31 Arnur Nigmetov , Dmitriy Morozov

targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance

To achieve high performance on modern computers, it is vital to map algorithmic parallelism to that inherent in the hardware. From an application developer's perspective, it is also important that code can be maintained in a portable manner…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-20 Alan Gray , Kevin Stratford

cpRRTC: GPU-Parallel RRT-Connect for Constrained Motion Planning

Motion planning is a fundamental problem in robotics that involves generating feasible trajectories for a robot to follow. Recent advances in parallel computing, particularly through CPU and GPU architectures, have significantly reduced…

Robotics · Computer Science 2025-05-13 Jiaming Hu , Jiawei Wang , Henrik Christensen

Multi-core & GPU-based Balanced Butterfly Counting in Signed Bipartite Graphs

Balanced butterfly counting, corresponding to counting balanced (2, 2)-bicliques, is a fundamental primitive in the analysis of signed bipartite graphs and provides a basis for studying higher-order structural properties such as clustering…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Mekala Kiran , Apurba Das , Suman Banerjee , Tathagata Ray