Related papers: Distributed Work Stealing in a Task-Based Dataflow…
In this paper, we derive and investigate approaches to dynamically load balance a distributed task parallel application software. The load balancing strategy is based on task migration. Busy processes export parts of their ready task queue…
Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability. However, existing algorithms are…
Work-stealing is a popular technique to implement dynamic load balancing in a distributed manner. In this approach, each process owns a set of tasks that have to be executed. The owner of the set can put tasks in it and can take tasks from…
Work sharing and work stealing are two scheduling paradigms to redistribute work when performing distributed computations. In work sharing, processors attempt to migrate pending jobs to other processors in the hope of reducing response…
Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual…
With the dissemination of affordable parallel and distributed hardware, parallel and distributed constraint solving has lately been the focus of some attention. To effectually apply the power of distributed computational systems, there must…
We show how to extend classical work-stealing to deal also with data parallel tasks that can require any number of threads r >= 1 for their execution. We explain in detail the so introduced idea of work-stealing with deterministic…
We study in this paper the impact of communication latency on the classical Work Stealing load balancing algorithm. Our approach considers existing performance models and the underlying algorithms. We introduce a latency parameter in the…
Supercomputers have revolutionized how industries and scientific fields process large amounts of data. These machines group hundreds or thousands of computing nodes working together to execute time-consuming programs that require a large…
Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…
Task graphs have been studied for decades as a foundation for scheduling irregular parallel applications and incorporated in programming models such as OpenMP. While many high-performance parallel libraries are based on task graphs, they…
This paper investigates a variant of the work-stealing algorithm that we call the localized work-stealing algorithm. The intuition behind this variant is that because of locality, processors can benefit from working on their own work.…
Algorithms for scheduling structured parallel computations have been widely studied in the literature. For some time now, Work Stealing is one of the most popular for scheduling such computations, and its performance has been studied in…
Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This…
We study in this paper the impact of communication latency on the classical Work Stealing load balancing algorithm. Our paper extends the reference model in which we introduce a latency parameter. By using a theoretical analysis and…
Dataflow devices represent an avenue towards saving the control and data movement overhead of Load-Store Architectures. Various dataflow accelerators have been proposed, but how to efficiently schedule applications on such devices remains…
Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. While randomized hashing on the keys is capable of…
We consider a parallel system of $m$ identical machines prone to unpredictable crashes and restarts, trying to cope with the continuous arrival of tasks to be executed. Tasks have different computational requirements (i.e., processing time…
The aim of this paper is to provide a description of deep-learning-based scheduling approach for academic-purpose high-performance computing systems. The share of academic-purpose distributed computing systems (DCS) reaches 17.4 percents…
As numerous machine learning and other algorithms increase in complexity and data requirements, distributed computing becomes necessary to satisfy the growing computational and storage demands, because it enables parallel execution of…