Related papers: CppSs -- a C++ Library for Efficient Task Parallel…
The rise of serverless computing introduced a new class of scalable, elastic and widely available parallel workers in the cloud. Many systems and applications benefit from offloading computations and parallel tasks to dynamically allocated…
Currently, multi/many-core CPUs are considered standard in most types of computers including, mobile phones, PCs or supercomputers. However, the parallelization of applications as well as refactoring/design of applications for efficient…
Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific…
Parallel programming remains a daunting challenge, from the struggle to express a parallel algorithm without cluttering the underlying synchronous logic, to describing which devices to employ in a calculation, to correctness. Over the…
Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a…
As the complexity and scale of modern parallel machines continue to grow, programmers increasingly rely on composition of software libraries to encapsulate and exploit parallelism. However, many libraries are not designed with composition…
We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features…
We present ensmallen, a fast and flexible C++ library for mathematical optimization of arbitrary user-supplied functions, which can be applied to many machine learning problems. Several types of optimizations are supported, including…
Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…
Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating…
We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and…
In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…
The C++ programming language is not only a keystone of the high-performance-computing ecosystem but has proven to be a successful base for portable parallel-programming frameworks. As is well known, C++ programmers use templates to…
This paper describes QuickSched, a compact and efficient Open-Source C-language library for task-based shared-memory parallel programming. QuickSched extends the standard dependency-only scheme of task-based programming with the concept of…
This report provides an introduction to the ensmallen numerical optimization library, as well as a deep dive into the technical details of how it works. The library provides a fast and flexible C++ framework for mathematical optimization of…
In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a…
As computational challenges in optimization and statistical inference grow ever harder, algorithms that utilize derivatives are becoming increasingly more important. The implementation of the derivatives that make these algorithms so…
C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short…
When implementing functionality which requires sparse matrices, there are numerous storage formats to choose from, each with advantages and disadvantages. To achieve good performance, several formats may need to be used in one program,…
This paper provides the description of a novel, multi-purpose spline library. In accordance with the increasingly diverse modes of usage of splines, it is multi-purpose in the sense that it supports geometry representation, finite element…