Related papers: Taskgraph: A Low Contention OpenMP Tasking Framewo…

Supporting OpenMP 5.0 Tasks in hpxMP -- A study of an OpenMP implementation within Task Based Runtime Systems

OpenMP has been the de facto standard for single node parallelism for more than a decade. Recently, asynchronous many-task runtime (AMT) systems have increased in popularity as a new programming paradigm for high performance computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-20 Tianyi Zhang , Shahrzad Shirzad , Bibek Wagle , Adrian S. Lemoine , Patrick Diehl , Hartmut Kaiser

Asynchronous Runtime with Distributed Manager for Task-based Programming Models

Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-09 Jaume Bosch , Carlos Álvarez , Daniel Jiménez-González , Xavier Martorell , Eduard Ayguadé

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

The OpenMP Cluster Programming Model

Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-16 Hervé Yviquel , Marcio Pereira , Emílio Francesquini , Guilherme Valarini , Gustavo Leite , Pedro Rosso , Rodrigo Ceccato , Carla Cusihualpa , Vitoria Dias , Sandro Rigo , Alan Souza , Guido Araujo

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

GPU-based HPC clusters are attracting more scientific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-10 Ali TehraniJamsaz , Alok Mishra , Akash Dutta , Abid M. Malik , Barbara Chapman , Ali Jannesari

Advanced Synchronization Techniques for Task-based Runtime Systems

Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-18 David Álvarez , Kevin Sala , Marcos Maroñas , Aleix Roca , Vicenç Beltran

Accelerating Task-based Iterative Applications

Task-based programming models have risen in popularity as an alternative to traditional fork-join parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-15 David Álvarez , Vicenç Beltran

Task-Graph Scheduling Extensions for Efficient Synchronization and Communication

Task graphs have been studied for decades as a foundation for scheduling irregular parallel applications and incorporated in programming models such as OpenMP. While many high-performance parallel libraries are based on task graphs, they…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-09 Seonmyeong Bak , Oscar Hernandez , Mark Gates , Piotr Luszczek , Vivek Sarkar

Hypergraph-Aided Task-Resource Matching for Maximizing Value of Task Completion in Collaborative IoT Systems

With the growing scale and intrinsic heterogeneity of Internet of Things (IoT) systems, distributed device collaboration becomes essential for effective task completion by dynamically utilizing limited communication and computing resources.…

Systems and Control · Electrical Eng. & Systems 2025-08-04 Botao Zhu , Xianbin Wang

Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents

Recent advances in large language models (LLMs) have enabled agents to autonomously execute complex, long-horizon tasks, yet planning remains a primary bottleneck for reliable task execution. Existing methods typically fall into two…

Artificial Intelligence · Computer Science 2026-01-13 Yunfan Li , Bingbing Xu , Xueyun Tian , Xiucheng Xu , Huawei Shen

A Parallel Task-based Approach to Linear Algebra

Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-07 Ashkan Tousimojarad , Wim Vanderbauwhede

HGMP:Heterogeneous Graph Multi-Task Prompt Learning

The pre-training and fine-tuning methods have gained widespread attention in the field of heterogeneous graph neural networks due to their ability to leverage large amounts of unlabeled data during the pre-training phase, allowing the model…

Machine Learning · Computer Science 2025-07-11 Pengfei Jiao , Jialong Ni , Di Jin , Xuan Guo , Huan Liu , Hongjiang Chen , Yanxian Bi

Experiences with task-based programming using cluster nodes as OpenMP devices

Programming a distributed system, such as a cluster, requires extended use of low-level communication libraries and can often become cumbersome and error prone for the average developer. In this work, we consider each node of a cluster as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-24 Ilias Keftakis , Vassilios V. Dimakopoulos

Dependency Graph Approach for Multiprocessor Real-Time Synchronization

Over the years, many multiprocessor locking protocols have been designed and analyzed. However, the performance of these protocols highly depends on how the tasks are partitioned and prioritized and how the resources are shared locally and…

Operating Systems · Computer Science 2018-09-11 Jian-Jia Chen , Georg von der Brüggen , Junjie Shi , Niklas Uete

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-29 Jonas H. Müller Korndörfer , Ahmed Eleliemy , Ali Mohammed , Florina M. Ciorba

DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP

As core counts and heterogeneity rise in HPC, traditional hybrid programming models face challenges in managing distributed GPU memory and ensuring portability. This paper presents DiOMP, a distributed OpenMP framework that unifies OpenMP…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Baodi Shan , Mauricio Araya-Polo , Barbara Chapman

Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems

Achieving efficient task parallelism on many-core architectures is an important challenge. The widely used GNU OpenMP implementation of the popular OpenMP parallel programming model incurs high overhead for fine-grained, short-running tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-20 Wenyi Wang , Maxime Gonthier , Poornima Nookala , Haochen Pan , Ian Foster , Ioan Raicu , Kyle Chard

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors

Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Juan Carlos Saez , Fernando Castro , Manuel Prieto-Matias

Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores

Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-03 Denis Los , Igor Petushkov

A Granularity Characterization of Task Scheduling Effectiveness

Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity. As parallelism increases, scheduling overhead may transition…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Sana Taghipour Anvari , David Kaeli