Related papers: GPU Offloading in ExaHyPE Through C++ Standard Alg…

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading

HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base…

Programming Languages · Computer Science 2022-12-20 Jeffrey Kelling , Sergei Bastrakov , Alexander Debus , Thomas Kluge , Matt Leinhauser , Richard Pausch , Klaus Steiniger , Jan Stephan , René Widera , Jeff Young , Michael Bussmann , Sunita Chandrasekaran , Guido Juckeland

Closing the Performance Gap with Modern C++

On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Thomas Heller , Hartmut Kaiser , Patrick Diehl , Dietmar Fey , Marc Alexander Schweitzer

Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++

The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading…

Mathematical Software · Computer Science 2025-07-25 Giulio Malenza , Giovanni Stabile , Filippo Spiga , Robert Birke , Marco Aldinucci

ExaHyPE: An Engine for Parallel Dynamically Adaptive Simulations of Wave Problems

ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are…

Mathematical Software · Computer Science 2020-07-15 Anne Reinarz , Dominic E. Charrier , Michael Bader , Luke Bovard , Michael Dumbser , Kenneth Duru , Francesco Fambri , Alice-Agnes Gabriel , Jean-Matthieu Gallard , Sven Köppel , Lukas Krenz , Leonhard Rannabauer , Luciano Rezzolla , Philipp Samfass , Maurizio Tavelli , Tobias Weinzierl

Parallelizing Workload Execution in Embedded and High-Performance Heterogeneous Systems

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

A New Execution Model and Executor for Adaptively Optimizing the Performance of Parallel Algorithms Using HPX Runtime System

Developing parallel algorithms efficiently requires careful management of concurrency across diverse hardware architectures. C++ executors provide a standardized interface that simplifies the development process, allowing developers to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-22 Karame Mohammadiporshokooh , Steven R. Brandt , Hartmut Kaiser

Massive parallelization and performance enhancement of an immersed boundary method based unsteady flow solver

High-fidelity simulations of unsteady fluid flow are now possible with advancements in high-performance computing hardware and software frameworks. Since computational fluid dynamics (CFD) computations are dominated by linear algebraic…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-28 Rahul Sundar , Dipanjan Majumdar , Chhote Lal Shah , Sunetra Sarkar

A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems

Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-01-11 Marek Blazewicz , Steven R. Brandt , Peter Diener , David M. Koppelman , Krzysztof Kurowski , Frank Löffler , Erik Schnetter , Jian Tao

A Programming Model for GPU Load Balancing

We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Muhammad Osama , Serban D. Porumbescu , John D. Owens

A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-22 Basilis Mamalis , Marios Perlitis

CI/CD Efforts for Validation, Verification and Benchmarking OpenMP Implementations

Software developers must adapt to keep up with the changing capabilities of platforms so that they can utilize the power of High- Performance Computers (HPC), including exascale systems. OpenMP, a directive-based parallel programming model,…

Programming Languages · Computer Science 2024-08-22 Aaron Jarmusch , Felipe Cabarcas , Swaroop Pophale , Andrew Kallai , Johannes Doerfert , Luke Peyralans , Seyong Lee , Joel Denny , Sunita Chandrasekaran

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-17 Tsung-Wei Huang , Yibo Lin

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-16 Yehonatan Fridman , Guy Tamir , Gal Oren

Porting OpenACC to OpenMP on heterogeneous systems

This documentation is designed for beginners in Graphics Processing Unit (GPU)-programming and who want to get familiar with OpenACC and OpenMP offloading models. Here we present an overview of these two programming models as well as of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-31 Hichan Agueny

Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines

The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-07-05 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarc'H , Jean-Luc Dekeyser , Yvonnick Le Menach

OpenMP Advisor

With the increasing diversity of heterogeneous architecture in the HPC industry, porting a legacy application to run on different architectures is a tough challenge. In this paper, we present OpenMP Advisor, a first of its kind compiler…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-11 Alok Mishra , Abid M. Malik , Meifeng Lin , Barbara Chapman

OpenCAEPoro: A Parallel Simulation Framework for Multiphase and Multicomponent Porous Media Flows

OpenCAEPoro is a parallel numerical simulation software developed in C++ for simulating multiphase and multicomponent flows in porous media. The software utilizes a set of general-purpose compositional model equations, enabling it to handle…

Mathematical Software · Computer Science 2024-06-18 Shizhe Li , Chen-Song Zhang

A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations

Last several years, GPUs are used to accelerate computations in many computer science domains. We focused on GPU accelerated Support Vector Machines (SVM) training with non-linear kernel functions. We had searched for all available GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-21 Jan Vanek , Josef Michalek , Josef Psutka

Exploration of Cryptocurrency Mining-Specific GPUs in AI Applications: A Case Study of CMP 170HX

This study systematically tests a computational power reuse scheme proposed by the open source community disabling specific instruction sets (Fused Multiply Add instructions) through CUDA source code modifications on the NVIDIA CMP 170HX…

Hardware Architecture · Computer Science 2025-05-09 Xing Kangwei

Integration of CUDA Processing within the C++ library for parallelism and concurrency (HPX)

Experience shows that on today's high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-07 Patrick Diehl , Madhavan Seshadri , Thomas Heller , Hartmut Kaiser