Related papers: OpenCL Actors - Adding Data Parallelism to Actor-b…
The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor…
Heterogeneous computing platforms consisting of general purpose processors (GPPs) and graphics processing units (GPUs) have become commonplace in personal mobile devices and embedded systems. For years, programming of these platforms was…
Parallel hardware makes concurrency mandatory for efficient program execution. However, writing concurrent software is both challenging and error-prone. C++11 provides standard facilities for multiprogramming, such as atomic operations with…
On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…
The Actor model is a mathematical theory that treats "Actors" as the universal primitives of concurrent digital computation. The model has been used both as a framework for a theoretical understanding of concurrency, and as the theoretical…
Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in…
OpenCL is an open standard for parallel programming of heterogeneous compute devices, such as GPUs, CPUs, DSPs or FPGAs. However, the verbosity of its C host API can hinder application development. In this paper we present cf4ocl, a…
Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified…
Quantum computing is an emerging technology, promising a paradigm shift in computing, and allowing for speedups in many different problems. However, quantum devices are still in their early stages, most with only a small number qubits. This…
The message-driven nature of actors lays a foundation for developing scalable and distributed software. While the actor itself has been thoroughly modeled, the message passing layer lacks a common definition. Properties and guarantees of…
We propose a novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine. The framework is algorithm agnostic and can be applied to…
The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the…
Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus…
Programming models for distributed and heterogeneous machines are rapidly growing in popularity to meet the demands of modern workloads. Task and actor models are common choices that offer different trade-offs between development…
The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading…
Designing low-latency cloud-based applications that are adaptable to unpredictable workloads and efficiently utilize modern cloud computing platforms is hard. The actor model is a popular paradigm that can be used to develop distributed…
Heterogeneous clusters with nodes containing one or more accelerators, such as GPUs, have become common. While MPI provides inter-address space communication, and OpenCL provides a process with access to heterogeneous computational…
Parallel data processing has become indispensable for processing applications involving huge data sets. This brings into focus the Graphics Processing Units (GPUs) which emphasize on many-core computing. With the advent of General Purpose…
This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other matrix algebra primitives and some…