English
Related papers

Related papers: A GPU-Accelerated Barycentric Lagrange Treecode

200 papers

We present the barycentric Lagrange dual tree traversal (BLDTT) fast summation method for particle interactions. The scheme replaces well-separated particle-particle interactions by adaptively chosen particle-cluster, cluster-particle, and…

Computational Physics · Physics 2021-06-02 Leighton Wilson , Nathan Vaughn , Robert Krasny

A kernel-independent treecode (KITC) is presented for fast summation of particle interactions. The method employs barycentric Lagrange interpolation at Chebyshev points to approximate well-separated particle-cluster interactions. The KITC…

Numerical Analysis · Mathematics 2021-11-24 Lei Wang , Robert Krasny , Svetlana Tlupova

I describe here the performances of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture…

Astrophysics · Physics 2007-05-23 R. Valdarnini

I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture…

Astrophysics · Physics 2009-11-07 R. Valdarnini

Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale. Existing parameter synchronization protocols cannot effectively leverage available network resources in the face of ever increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-14 Guanhua Wang , Shivaram Venkataraman , Amar Phanishayee , Jorgen Thelin , Nikhil Devanur , Ion Stoica

Our formulation reveals that the reduction across the sequence axis can be efficiently computed in parallel through a tree reduction. Our algorithm, called Tree Attention, for parallelizing exact attention computation across multiple GPUs…

Machine Learning · Computer Science 2025-02-11 Vasudev Shyam , Jonathan Pilault , Emily Shepperd , Quentin Anthony , Beren Millidge

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-01 Christian Feichtinger , Johannes Habich , Harald Koestler , Georg Hager , Ulrich Ruede , Gerhard Wellein

This paper presents an octree construction method, called Cornerstone, that facilitates global domain decomposition and interactions between particles in mesh-free numerical simulations. Our method is based on algorithms developed for 3D…

Instrumentation and Methods for Astrophysics · Physics 2023-07-14 Sebastian Keller , Aurélien Cavelan , Rubén Cabezon , Lucio Mayer , Florina M. Ciorba

To effectively control large-scale distributed systems online, model predictive control (MPC) has to swiftly solve the underlying high-dimensional optimization. There are multiple techniques applied to accelerate the solving process in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-30 Carmen Amo Alonso , Shih-Hao Tseng

We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates…

Fluid Dynamics · Physics 2022-11-21 Ao Xu , Bo-Tao Li

We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-25 Mohammad Almasri , Yen-Hsiang Chang , Izzat El Hajj , Rakesh Nagi , Jinjun Xiong , Wen-mei Hwu

Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-25 Yuan Meng , Rajgopal Kannan , Viktor Prasanna

In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests…

Machine Learning · Statistics 2017-06-27 Huan Zhang , Si Si , Cho-Jui Hsieh

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our…

Computational Physics · Physics 2021-05-11 Jonas Latt , Christophe Coreixas , Joël Beny

We utilize the Open Accelerator (OpenACC) approach for graphics processing unit (GPU) accelerated particle-resolved thermal lattice Boltzmann (LB) simulation. We adopt the momentum-exchange method to calculate fluid-particle interactions to…

Fluid Dynamics · Physics 2023-10-06 Ao Xu , Bo-Tao Li

Decision trees are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior…

Artificial Intelligence · Computer Science 2022-07-27 Efthyvoulos Drousiotis , Paul G. Spirakis

A merge tree is a topological descriptor of a real-valued function. Merge trees are used in visualization and topological data analysis, either directly or as a means to another end: computing a 0-dimensional persistence diagram,…

Computational Geometry · Computer Science 2023-01-31 Arnur Nigmetov , Dmitriy Morozov

To achieve high performance on modern computers, it is vital to map algorithmic parallelism to that inherent in the hardware. From an application developer's perspective, it is also important that code can be maintained in a portable manner…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-20 Alan Gray , Kevin Stratford

Motion planning is a fundamental problem in robotics that involves generating feasible trajectories for a robot to follow. Recent advances in parallel computing, particularly through CPU and GPU architectures, have significantly reduced…

Robotics · Computer Science 2025-05-13 Jiaming Hu , Jiawei Wang , Henrik Christensen

Balanced butterfly counting, corresponding to counting balanced (2, 2)-bicliques, is a fundamental primitive in the analysis of signed bipartite graphs and provides a basis for studying higher-order structural properties such as clustering…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Mekala Kiran , Apurba Das , Suman Banerjee , Tathagata Ray
‹ Prev 1 2 3 10 Next ›