Related papers: An Interrupt-Driven Work-Sharing For-Loop Schedule…

A Lock-Free Work-Stealing Algorithm for Bulk Operations

Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability. However, existing algorithms are…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-09 Raja Sai Nandhan Yadav Kataru , Danial Davarnia , Ali Jannesari

Scheduling computations with provably low synchronization overheads

Work Stealing has been a very successful algorithm for scheduling parallel computations, and is known to achieve high performances even for computations exhibiting fine-grained parallelism. We present a variant of \ws\ that provably avoids…

Data Structures and Algorithms · Computer Science 2019-04-30 Guilherme Rito , Hervé Paulino

Fully Read/Write Fence-Free Work-Stealing with Multiplicity

Work-stealing is a popular technique to implement dynamic load balancing in a distributed manner. In this approach, each process owns a set of tasks that have to be executed. The owner of the set can put tasks in it and can take tasks from…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-23 Armando Castañeda , Miguel Piña

An Adaptive Self-Scheduling Loop Scheduler

Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-29 Joshua Dennis Booth , Phillip Lane

Analysis of Work-Stealing and Parallel Cache Complexity

Parallelism has become extremely popular over the past decade, and there have been a lot of new parallel algorithms and software. The randomized work-stealing (RWS) scheduler plays a crucial role in this ecosystem. In this paper, we study…

Data Structures and Algorithms · Computer Science 2021-11-10 Yan Gu , Zachary Napier , Yihan Sun

On the analysis of scheduling algorithms for structured parallel computations

Algorithms for scheduling structured parallel computations have been widely studied in the literature. For some time now, Work Stealing is one of the most popular for scheduling such computations, and its performance has been studied in…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-26 Guilherme Rito , Hervé Paulino

iDDS: Intelligent Distributed Dispatch and Scheduling for Workflow Orchestration

The intelligent Distributed Dispatch and Scheduling (iDDS) service is a versatile workflow orchestration system designed for large-scale, distributed scientific computing. iDDS extends traditional workload and data management by integrating…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-23 Wen Guan , Tadashi Maeno , Aleksandr Alekseev , Fernando Harald Barreiro Megino , Kaushik De , Edward Karavakis , Alexei Klimentov , Tatiana Korchuganova , FaHui Lin , Paul Nilsson , Torre Wenaus , Zhaoyu Yang , Xin Zhao

Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-29 Homa Esfahanizadeh , Alejandro Cohen , Muriel Medard

Scheduling and Trade-off Analysis for Multi-Source Multi-Processor Systems with Divisible Loads

The main goal of parallel processing is to provide users with performance that is much better than that of single processor systems. The execution of jobs is scheduled, which requires certain resources in order to meet certain criteria.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Yang Cao , Fei Wu , Thomas Robertazzi

Optimal Divisible Load Scheduling for Resource-Sharing Network

Scheduling is an important task allowing parallel systems to perform efficiently and reliably. For modern computation systems, divisible load is a special type of data which can be divided into arbitrary sizes and independently processed in…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Fei Wu , Yang Cao , Thomas Robertazzi

Configurable Strategies for Work-stealing

Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-05-29 Martin Wimmer , Daniel Cederman , Jesper Larsson Träff , Philippas Tsigas

Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "$\parallel$" (parallel) and "$;$" (serial), are insufficient in expressing "partial dependencies" or…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-16 David Dinh , Harsha Vardhan Simhadri , Yuan Tang

Parallel Closed-Loop Connected Vehicle Simulator for Large-Scale Transportation Network Management: Challenges, Issues, and Solution Approaches

The augmented scale and complexity of urban transportation networks have significantly increased the execution time and resource requirements of vehicular network simulations, exceeding the capabilities of sequential simulators. The need…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-08 Mohammad A Hoque , Xiaoyan Hong , Md Salman Ahmed

Rethinking Thread Scheduling under Oversubscription: A User-Space Framework for Coordinating Multi-runtime and Multi-process Workloads

The convergence of high-performance computing (HPC) and artificial intelligence (AI) is driving the emergence of increasingly complex parallel applications and workloads. These workloads often combine multiple parallel runtimes within the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-29 Aleix Roca , Vicenç Beltran

Asynchronous Runtime with Distributed Manager for Task-based Programming Models

Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-09 Jaume Bosch , Carlos Álvarez , Daniel Jiménez-González , Xavier Martorell , Eduard Ayguadé

Shared-object System Equilibria: Delay and Throughput Analysis

We consider shared-object systems that require their threads to fulfill the system jobs by first acquiring sequentially the objects needed for the jobs and then holding on to them until the job completion. Such systems are in the core of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-04 Iosif Salem , Elad M. Schiller , Marina Papatriantafilou , Philippas Tsigas

A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle

Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-08 Justin Deters , Jiaye Wu , Yifan Xu , I-Ting Angelina Lee

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 M. Maronas , K. Sala , S. Mateo , E. Ayguadé , V. Beltran Barcelona Supercomputing Center

Distributed Work Stealing in a Task-Based Dataflow Runtime

The task-based dataflow programming model has emerged as an alternative to the process-centric programming model for extreme-scale applications. However, load balancing is still a challenge in task-based dataflow runtimes. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-11 Joseph John , Josh Milthorpe , Peter Strazdins

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-25 Ahmed Eleliemy , Florina M. Ciorba