Related papers: A Lock-Free Work-Stealing Algorithm for Bulk Opera…

Fully Read/Write Fence-Free Work-Stealing with Multiplicity

Work-stealing is a popular technique to implement dynamic load balancing in a distributed manner. In this approach, each process owns a set of tasks that have to be executed. The owner of the set can put tasks in it and can take tasks from…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-23 Armando Castañeda , Miguel Piña

Configurable Strategies for Work-stealing

Work-stealing systems are typically oblivious to the nature of the tasks they are scheduling. For instance, they do not know or take into account how long a task will take to execute or how many subtasks it will spawn. Moreover, the actual…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-05-29 Martin Wimmer , Daniel Cederman , Jesper Larsson Träff , Philippas Tsigas

Work-stealing for mixed-mode parallelism by deterministic team-building

We show how to extend classical work-stealing to deal also with data parallel tasks that can require any number of threads r >= 1 for their execution. We explain in detail the so introduced idea of work-stealing with deterministic…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-23 Martin Wimmer , Jesper Larsson Träff

Scheduling computations with provably low synchronization overheads

Work Stealing has been a very successful algorithm for scheduling parallel computations, and is known to achieve high performances even for computations exhibiting fine-grained parallelism. We present a variant of \ws\ that provably avoids…

Data Structures and Algorithms · Computer Science 2019-04-30 Guilherme Rito , Hervé Paulino

Distributed Work Stealing in a Task-Based Dataflow Runtime

The task-based dataflow programming model has emerged as an alternative to the process-centric programming model for extreme-scale applications. However, load balancing is still a challenge in task-based dataflow runtimes. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-11 Joseph John , Josh Milthorpe , Peter Strazdins

Parallel Stream Processing Against Workload Skewness and Variance

Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. While randomized hashing on the keys is capable of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-14 Junhua Fang , Rong Zhang , Tom Z. J. Fu , Zhenjie Zhang , Aoying Zhou , Junhua Zhu

On the analysis of scheduling algorithms for structured parallel computations

Algorithms for scheduling structured parallel computations have been widely studied in the literature. For some time now, Work Stealing is one of the most popular for scheduling such computations, and its performance has been studied in…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-26 Guilherme Rito , Hervé Paulino

On the Efficiency of Localized Work Stealing

This paper investigates a variant of the work-stealing algorithm that we call the localized work-stealing algorithm. The intuition behind this variant is that because of locality, processors can benefit from working on their own work.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-16 Warut Suksompong , Charles E. Leiserson , Tao B. Schardl

Adaptive Asynchronous Work-Stealing for distributed load-balancing in heterogeneous systems

Supercomputers have revolutionized how industries and scientific fields process large amounts of data. These machines group hundreds or thousands of computing nodes working together to execute time-consuming programs that require a large…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-24 João B. Fernandes , Ítalo A. S. de Assis , Idalmis M. S. Martins , Tiago Barros , Samuel Xavier-de-Souza

Analysis of Work-Stealing and Parallel Cache Complexity

Parallelism has become extremely popular over the past decade, and there have been a lot of new parallel algorithms and software. The randomized work-stealing (RWS) scheduler plays a crucial role in this ecosystem. In this paper, we study…

Data Structures and Algorithms · Computer Science 2021-11-10 Yan Gu , Zachary Napier , Yihan Sun

Are Lock-Free Concurrent Algorithms Practically Wait-Free?

Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free, guaranteeing that all operations always…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-18 Dan Alistarh , Keren Censor-Hillel , Nir Shavit

Supporting Parallelism in Server-based Multiprocessor Systems

Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-06-15 Luís Nogueira , Luís Miguel Pinho

Optimal Hyper-Scalable Load Balancing with a Strict Queue Limit

Load balancing plays a critical role in efficiently dispatching jobs in parallel-server systems such as cloud networks and data centers. A fundamental challenge in the design of load balancing algorithms is to achieve an optimal trade-off…

Performance · Computer Science 2020-12-16 Mark van der Boor , Sem Borst , Johan van Leeuwaarden

Parallelizing Query Optimization on Shared-Nothing Architectures

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Flat Parallelization

There are two intertwined factors that affect performance of concurrent data structures: the ability of processes to access the data in parallel and the cost of synchronization. It has been observed that for a large class of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-10 Vitaly Aksenov , Petr Kuznetsov

Throughput-Optimal Load Balancing for Intra Datacenter Networks

Traffic load-balancing in datacenters alleviates hot spots and improves network utilization. In this paper, a stable in-network load-balancing algorithm is developed in the setting of software-defined networking. A control plane configures…

Networking and Internet Architecture · Computer Science 2016-12-07 Sucha Supittayapornpong , Michael J. Neely

The Adaptive Priority Queue with Elimination and Combining

Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-06 Irina Calciu , Hammurabi Mendes , Maurice Herlihy

Task-Graph Scheduling Extensions for Efficient Synchronization and Communication

Task graphs have been studied for decades as a foundation for scheduling irregular parallel applications and incorporated in programming models such as OpenMP. While many high-performance parallel libraries are based on task graphs, they…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-09 Seonmyeong Bak , Oscar Hernandez , Mark Gates , Piotr Luszczek , Vivek Sarkar

Multi-Queues Can Be State-of-the-Art Priority Schedulers

Designing and implementing efficient parallel priority schedulers is an active research area. An intriguing proposed design is the Multi-Queue: given $n$ threads and $m\ge n$ distinct priority queues, task insertions are performed uniformly…

Data Structures and Algorithms · Computer Science 2021-09-03 Anastasiia Postnikova , Nikita Koval , Giorgi Nadiradze , Dan Alistarh

Co-Scheduling Algorithms for High-Throughput Workload Execution

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…

Data Structures and Algorithms · Computer Science 2013-05-01 Guillaume Aupy , Manu Shantharam , Anne Benoit , Yves Robert , Padma Raghavan