Related papers: Concurrent CPU-GPU Task Programming using Modern C…
Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…
Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…
Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific…
FastFlow is a programming environment specifically targeting cache-coherent shared-memory multi-cores. FastFlow is implemented as a stack of C++ template libraries built on top of lock-free (fence-free) synchronization mechanisms. In this…
On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…
The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading…
We present a unified programming model for heterogeneous computing systems. Such systems integrate multiple computing accelerators and memory units to deliver higher performance than CPU-centric systems. Although heterogeneous systems have…
Parallel programming remains a daunting challenge, from the struggle to express a parallel algorithm without cluttering the underlying synchronous logic, to describing which devices to employ in a calculation, to correctness. Over the…
In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…
Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes. The C++ language has been continuously enhanced in recent years…
In recent years, it has become increasingly common for high performance computers (HPC) to possess some level of heterogeneous architecture - typically in the form of GPU accelerators. In some machines these are isolated within a dedicated…
Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several…
This paper consists of three parts. The first part provides a unified programming model for heterogeneous computing with CPU and accelerator (like GPU, FPGA, Google TPU, Atos QPU, and more) technologies. To some extent, this new programming…
In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…
This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI,…
We propose a CPU-GPU heterogeneous computing method for solving time-evolution partial differential equation problems many times with guaranteed accuracy, in short time-to-solution and low energy-to-solution. On a single-GH200 node, the…
The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements…
TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML)…
In this paper we would like to share our experience for transforming a parallel code for a Computational Fluid Dynamics (CFD) problem into a parallel version for the RedisDG workflow engine. This system is able to capture heterogeneous and…
Modern platforms used for high-performance computing (HPC) include machines with both general-purpose CPUs, and "accelerators", often in the form of graphical processing units (GPUs). StarPU is a C library to exploit such platforms. It…