English
Related papers

Related papers: Multi-GPU Performance Optimization of a CFD Code u…

200 papers

This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-10 Weicheng Xue , Charles W. Jackson , Christoper J. Roy

Asynchronous tasks, when created with over-decomposition, enable automatic computation-communication overlap which can substantially improve performance and scalability. This is not only applicable to traditional CPU-based systems, but also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-23 Jaemin Choi , David F. Richards , Laxmikant V. Kale

We introduce a distributed adaptive quadrature method that formulates multidimensional integration as a hierarchical domain decomposition problem on multi-GPU architectures. The integration domain is recursively partitioned into subdomains…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-04 Melanie Tonarelli , Simone Riva , Pietro Benedusi , Fabrizio Ferrandi , Rolf Krause

GPUs are the heart of the latest generations of supercomputers. We efficiently accelerate a compressible multiphase flow solver via OpenACC on NVIDIA and AMD Instinct GPUs. Optimization is accomplished by specifying the directive clauses…

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting…

This paper presents a comprehensive comparison of three dominant parallel programming models in High Performance Computing (HPC): Message Passing Interface (MPI), Open Multi-Processing (OpenMP), and Compute Unified Device Architecture…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-19 Nizar ALHafez , Ahmad Kurdi

The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-28 Kazuaki Matsumura , Simon Garcia De Gonzalo , Antonio J. Peña

This report highlights our work on improving GPU parallelization by supporting compute nodes with multiple GPUs. However, since the default support for multi-GPUs in OpenACC is limited[6], the current implementation allows each MPI process…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-24 Saheed Bolarinwa

Modern GPU systems are constantly evolving to meet the needs of computing-intensive applications in scientific and machine learning domains. However, there is typically a gap between the hardware capacity and the achievable application…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-02 Gabin Schieffer , Ruimin Shi , Stefano Markidis , Andreas Herten , Jennifer Faj , Ivy Peng

High-fidelity simulations of unsteady fluid flow are now possible with advancements in high-performance computing hardware and software frameworks. Since computational fluid dynamics (CFD) computations are dominated by linear algebraic…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-28 Rahul Sundar , Dipanjan Majumdar , Chhote Lal Shah , Sunetra Sarkar

Hardware-aware design and optimization is crucial in exploiting emerging architectures for PDE-based computational fluid dynamics applications. In this work, we study optimizations aimed at acceleration of OpenFOAM-based applications on…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-29 Amani AlOnazi , David Keyes , Alexey Lastovetsky , Vladimir Rychkov

An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-02 E. Calore , A. Gabbana , J. Kraus , S. F. Schifano , R. Tripiccione

Accelerated computing is widely used in high-performance computing. Therefore, it is crucial to experiment and discover how to better utilize GPUGPUs latest generations on relevant applications. In this paper, we present results and share…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-13 Baodi Shan , Mauricio Araya-Polo

Multiphase compressible flows are often characterized by a broad range of space and time scales. Thus entailing large grids and small time steps, simulations of these flows on CPU-based clusters can thus take several wall-clock days.…

In this paper we present an optimized parallel implementation of a flexible MAP decoder for synchronization error correcting codes, supporting a very wide range of code sizes and channel conditions. On mid-range GPUs we demonstrate decoding…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-26 Johann A. Briffa

Modern compute nodes in high-performance computing provide a tremendous level of parallelism and processing power. However, as arithmetic performance has been observed to increase at a faster rate relative to memory and network bandwidths,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-11 Johannes Pekkilä , Miikka S. Väisälä , Maarit J. Käpylä , Matthias Rheinhardt , Oskar Lappi

We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates…

Fluid Dynamics · Physics 2022-11-21 Ao Xu , Bo-Tao Li

Parallel code design is a challenging task especially when addressing petascale systems for massive parallel processing (MPP), i.e. parallel computations on several hundreds of thousands of cores. An in-house computational fluid dynamics…

Performance · Computer Science 2018-07-03 Jérôme Frisch , Ralf-Peter Mundani

Massive off-chip accesses in GPUs are the main performance bottleneck, and we divided these accesses into three types: (1) Write, (2) Data-Read, and (3) Read-Only. Besides, We find that many writes are duplicate, and the duplication can be…

Hardware Architecture · Computer Science 2024-08-20 Wei Zhao , Dan Feng , Wei Tong , Xueliang Wei , Bing Wu

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-22 Basilis Mamalis , Marios Perlitis
‹ Prev 1 2 3 10 Next ›