Related papers: A task-based data-flow methodology for programming…

A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one

This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-10 Cristian Ramon-Cortes , Francesc Lordan , Jorge Ejarque , Rosa M. Badia

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

Toward Heterogeneous, Distributed, and Energy-Efficient Computing with SYCL

Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes. The C++ language has been continuously enhanced in recent years…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-12 Biagio Cosenza , Lorenzo Carpentieri , Kaijie Fan , Marco D'Antonio , Peter Thoman , Philip Salzmann

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-17 Tsung-Wei Huang , Yibo Lin

A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems

Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-01-11 Marek Blazewicz , Steven R. Brandt , Peter Diener , David M. Koppelman , Krzysztof Kurowski , Frank Löffler , Erik Schnetter , Jian Tao

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-19 Yao Chen , Xin Long , Jiong He , Yuhang Chen , Hongshi Tan , Zhenxiang Zhang , Marianne Winslett , Deming Chen

Dato: A Task-Based Programming Model for Dataflow Accelerators

Recent deep learning workloads increasingly push computational demand beyond what current memory systems can sustain, with many kernels stalling on data movement rather than computation. While modern dataflow accelerators incorporate…

Programming Languages · Computer Science 2025-09-09 Shihan Fang , Hongzheng Chen , Niansong Zhang , Jiajie Li , Han Meng , Adrian Liu , Zhiru Zhang

Fork is All You Need in Heterogeneous Systems

We present a unified programming model for heterogeneous computing systems. Such systems integrate multiple computing accelerators and memory units to deliver higher performance than CPU-centric systems. Although heterogeneous systems have…

Emerging Technologies · Computer Science 2024-04-18 Zixuan Wang , Jishen Zhao

TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs

Despite the increasing adoption of Field-Programmable Gate Arrays (FPGAs) in compute clouds, there remains a significant gap in programming tools and abstractions which can leverage network-connected, cloud-scale, multi-die FPGAs to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-05 Neha Prakriya , Yuze Chi , Suhail Basalama , Linghao Song , Jason Cong

A Unified Programming Model for Heterogeneous Computing with CPU and Accelerator Technologies

This paper consists of three parts. The first part provides a unified programming model for heterogeneous computing with CPU and accelerator (like GPU, FPGA, Google TPU, Atos QPU, and more) technologies. To some extent, this new programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-31 Yuqing Xiong

Introducing the Task-Aware Storage I/O (TASIO) Library

Task-based programming models are excellent tools to parallelize and seamlessly load balance an application workload. However, the integration of I/O intensive applications and task-based programming models is lacking. Typically, I/O…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Aleix Roca Nonell , Vicenç Beltran Querol , Sergi Mateo Bellido

Exploiting co-execution with oneAPI: heterogeneity from a modern perspective

Programming efficiently heterogeneous systems is a major challenge, due to the complexity of their architectures. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-16 Raúl Nozal , Jose Luis Bosque

An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems

Heterogeneous accelerator-centric compute clusters are emerging as efficient solutions for diverse AI workloads. However, current integration strategies often compromise data movement efficiency and encounter compatibility issues in…

Hardware Architecture · Computer Science 2025-08-21 Ryan Albert Antonio , Joren Dumoulin , Xiaoling Yi , Josse Van Delm , Yunhao Deng , Guilherme Paim , Marian Verhelst

PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems

In the past decade, high performance compute capabilities exhibited by heterogeneous GPGPU platforms have led to the popularity of data parallel programming languages such as CUDA and OpenCL. Such languages, however, involve a steep…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-17 Anirban Ghose , Siddharth Singh , Vivek Kulaharia , Lokesh Dokara , Srijeeta Maity , Soumyajit Dey

Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available…

Systems and Control · Electrical Eng. & Systems 2023-05-03 Manos Pavlidakis , Stelios Mavridis , Antony Chazapis , Giorgos Vasiliadis , Angelos Bilas

A Survey of Real-time Scheduling on Accelerator-based Heterogeneous Architecture for Time Critical Applications

Accelerator-based heterogeneous architectures, such as CPU-GPU, CPU-TPU, and CPU-FPGA systems, are widely adopted to support the popular artificial intelligence (AI) algorithms that demand intensive computation. When deployed in real-time…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-20 An Zou , Yuankai Xu , Yinchen Ni , Jintao Chen , Yehan Ma , Jing Li , Christopher Gill , Xuan Zhang , Yier Jin

Open SYCL on heterogeneous GPU systems: A case of study

Computational platforms for high-performance scientific applications are becoming more heterogenous, including hardware accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-12 Rocío Carratalá-Sáez , Francisco J. andújar , Yuri Torres , Arturo Gonzalez-Escribano , Diego R. Llanos

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer

Enabling Dynamic and Intelligent Workflows for HPC, Data Analytics, and AI Convergence

The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-16 Jorge Ejarque , Rosa M. Badia , Loïc Albertin , Giovanni Aloisio , Enrico Baglione , Yolanda Becerra , Stefan Boschert , Julian R. Berlin , Alessandro D'Anca , Donatello Elia , François Exertier , Sandro Fiore , José Flich , Arnau Folch , Steven J Gibbons , Nikolay Koldunov , Francesc Lordan , Stefano Lorito , Finn Løvholt , Jorge Macías , Fabrizio Marozzo , Alberto Michelini , Marisol Monterrubio-Velasco , Marta Pienkowska , Josep de la Puente , Anna Queralt , Enrique S. Quintana-Ortí , Juan E. Rodríguez , Fabrizio Romano , Riccardo Rossi , Jedrzej Rybicki , Miroslaw Kupczyk , Jacopo Selva , Domenico Talia , Roberto Tonini , Paolo Trunfio , Manuela Volp

Enabling Scientific Workflow Scheduling Research in Non-Uniform Memory Access Architectures

Data-intensive scientific workflows increasingly rely on high-performance computing (HPC) systems, complementing traditional Grid and Cloud platforms. However, workflow scheduling on HPC infrastructures remains challenging due to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-26 Aurelio Vivas , Harold Castro