Related papers: HDArray: Parallel Array Interface for Distributed …
The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the…
We present HiCR, a model to represent the semantics of distributed heterogeneous applications and runtime systems. The model describes a minimal set of abstract operations to enable hardware topology discovery, kernel execution, memory…
Heterogeneous computing platforms consisting of general purpose processors (GPPs) and graphics processing units (GPUs) have become commonplace in personal mobile devices and embedded systems. For years, programming of these platforms was…
Heterogeneous systems have become one of the most common architectures today, thanks to their excellent performance and energy consumption. However, due to their heterogeneity they are very complex to program and even more to achieve…
Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…
Heterogeneous computing is the strategy of deploying multiple types of processing elements within a single workflow, and allowing each to perform the tasks to which is best suited. To fully harness the power of heterogeneity, we want to be…
High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…
In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…
This paper consists of three parts. The first part provides a unified programming model for heterogeneous computing with CPU and accelerator (like GPU, FPGA, Google TPU, Atos QPU, and more) technologies. To some extent, this new programming…
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…
Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…
The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers…
Heterogeneity is omnipresent in today's commodity computational systems, which comprise at least one multi-core Central Processing Unit (CPU) and one Graphics Processing Unit (GPU). Nonetheless, all this computing power is not being…
Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…
In the past decade, high performance compute capabilities exhibited by heterogeneous GPGPU platforms have led to the popularity of data parallel programming languages such as CUDA and OpenCL. Such languages, however, involve a steep…
Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware in the future. This…
Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such…
GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting…
Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware. This shift in the…
Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI…