Related papers: Advanced Synchronization Techniques for Task-based…
Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per…
OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a task-based model that offers a high-level of abstraction to effectively exploit highly dynamic structured and unstructured…
OpenMP has been the de facto standard for single node parallelism for more than a decade. Recently, asynchronous many-task runtime (AMT) systems have increased in popularity as a new programming paradigm for high performance computing…
Task-based programming models have risen in popularity as an alternative to traditional fork-join parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these…
The orchestration of complex algorithms demands high levels of automation to use modern hardware efficiently. Task-based programming with OpenMP 5.0 is a prominent candidate to accomplish this goal. We study OpenMP 5.0's tasking in the…
Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…
Task graphs have been studied for decades as a foundation for scheduling irregular parallel applications and incorporated in programming models such as OpenMP. While many high-performance parallel libraries are based on task graphs, they…
Task-based programming models have become very popular, as they offer an attractive solution to parallelize serial application code with task and data annotations. They usually depend on a runtime system that schedules the tasks to multiple…
Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this…
In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance…
Real-time systems applications usually consist of a set of concurrent activities with timing-related properties. Developing these applications requires programming paradigms that can effectively handle the specification of concurrent…
Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity. As parallelism increases, scheduling overhead may transition…
Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages…
Programming a distributed system, such as a cluster, requires extended use of low-level communication libraries and can often become cumbersome and error prone for the average developer. In this work, we consider each node of a cluster as a…
Task parallelism as employed by the OpenMP task construct, although ideal for tackling irregular problems or typical producer/consumer schemes, bears some potential for performance bottlenecks if locality of data access is important, which…
In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of…
In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by outlining the current state of the art in loop scheduling and presenting evidence that the existing OpenMP schedules are insufficient for…
Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…
Scientific and data science applications are becoming increasingly complex, with growing computational and memory demands. Modern high performance computing (HPC) systems provide high parallelism and heterogeneity across nodes, devices, and…
There are many science applications that require scalable task-level parallelism and support for flexible execution and coupling of ensembles of simulations. Most high-performance system software and middleware, however, are designed to…