Related papers: Task-Graph Scheduling Extensions for Efficient Syn…

Plan-over-Graph: Towards Parallelable LLM Agent Schedule

Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. However, challenges remain under-explored for parallel schedules. This paper introduces a novel paradigm, plan-over-graph, in which the…

Artificial Intelligence · Computer Science 2025-02-21 Shiqi Zhang , Xinbei Ma , Zouying Cao , Zhuosheng Zhang , Hai Zhao

Asynchronous Runtime with Distributed Manager for Task-based Programming Models

Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-09 Jaume Bosch , Carlos Álvarez , Daniel Jiménez-González , Xavier Martorell , Eduard Ayguadé

Streaming Task Graph Scheduling for Dataflow Architectures

Dataflow devices represent an avenue towards saving the control and data movement overhead of Load-Store Architectures. Various dataflow accelerators have been proposed, but how to efficiently schedule applications on such devices remains…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-06 Tiziano De Matteis , Lukas Gianinazzi , Johannes de Fine Licht , Torsten Hoefler

Taskgraph: A Low Contention OpenMP Tasking Framework

OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a task-based model that offers a high-level of abstraction to effectively exploit highly dynamic structured and unstructured…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-12 Chenle Yu , Sara Royuela , Eduardo Quiñones

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer

Analysis of Workflow Schedulers in Simulated Distributed Environments

Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-18 Jakub Beránek , Stanislav Böhm , Vojtěch Cima

Advanced Synchronization Techniques for Task-based Runtime Systems

Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-18 David Álvarez , Kevin Sala , Marcos Maroñas , Aleix Roca , Vicenç Beltran

Automatic Task Parallelization of Dataflow Graphs in ML/DL models

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search…

Machine Learning · Computer Science 2023-08-23 Srinjoy Das , Lawrence Rauchwerger

Efficient Two-Level Scheduling for Concurrent Graph Processing

With the rapidly growing demand of graph processing in the real scene, they have to efficiently handle massive concurrent jobs. Although existing work enable to efficiently handle single graph processing job, there are plenty of memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Jin Zhao

Scheduling Fork-Join Task Graphs to Heterogeneous Processors

The scheduling of task graphs with communication delays has been extensively studied. Recently, new results for the common sub-case of fork-join shaped task graphs were published, including an EPTAS and polynomial algorithms for special…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-06 Huijun Wang , Oliver Sinnen

Dependency Graph Approach for Multiprocessor Real-Time Synchronization

Over the years, many multiprocessor locking protocols have been designed and analyzed. However, the performance of these protocols highly depends on how the tasks are partitioned and prioritized and how the resources are shared locally and…

Operating Systems · Computer Science 2018-09-11 Jian-Jia Chen , Georg von der Brüggen , Junjie Shi , Niklas Uete

Efficiently Scheduling Parallel DAG Tasks on Identical Multiprocessors

Parallel real-time embedded applications can be modelled as directed acyclic graphs (DAGs) whose nodes model subtasks and whose edges model precedence constraints among subtasks. Efficiently scheduling such parallel tasks can be challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-24 Shardul Lendve , Konstantinos Bletsas , Pedro F. Souto

A Lock-Free Work-Stealing Algorithm for Bulk Operations

Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability. However, existing algorithms are…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-09 Raja Sai Nandhan Yadav Kataru , Danial Davarnia , Ali Jannesari

Task Graph Transformations for Latency Tolerance

The Integrative Model for Parallelism (IMP) derives a task graph from a higher level description of parallel algorithms. In this note we show how task graph transformations can be used to achieve latency tolerance in the program execution.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-14 Victor Eijkhout

Parallel Path Progression DAG Scheduling

To satisfy the increasing performance needs of modern cyber-physical systems, multiprocessor architectures are increasingly utilized. To efficiently exploit their potential parallelism in hard real-time systems, appropriate task models and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-26 Niklas Ueter , Mario Günzel , Georg von der Brüggen , Jian-Jia Chen

Configurable Strategies for Work-stealing

Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-05-29 Martin Wimmer , Daniel Cederman , Jesper Larsson Träff , Philippas Tsigas

A Proof of Concept for Optimizing Task Parallelism by Locality Queues

Task parallelism as employed by the OpenMP task construct, although ideal for tackling irregular problems or typical producer/consumer schemes, bears some potential for performance bottlenecks if locality of data access is important, which…

Performance · Computer Science 2009-02-12 Markus Wittmann , Georg Hager

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 M. Maronas , K. Sala , S. Mateo , E. Ayguadé , V. Beltran Barcelona Supercomputing Center

Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput

The vast amounts of data used in social, business or traffic networks, biology and other natural sciences are often managed in graph-based data sets, consisting of a few thousand up to billions and trillions of vertices and edges,…

Databases · Computer Science 2021-10-22 Matthias Hauck , Ismail Oukid , Holger Fröning