Related papers: An Efficient OpenMP Runtime System for Hierarchica…

Towards Efficient OpenMP Strategies for Non-Uniform Architectures

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan

Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework

Exploiting full computational power of current more and more hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. Unfortunately, most operating systems…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-06-15 Samuel Thibault , Raymond Namyst , Pierre-André Wacrenier

New Thread Migration Strategies for NUMA Systems

Multicore systems present on-board memory hierarchies and communication networks that influence performance when executing shared memory parallel codes. Characterising this influence is complex, and understanding the effect of particular…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-01 O. G. Lorenzo , M. L. Becoña , T. F. Pena , J. C. Cabaleiro , J. A. Lorenzo , F. F. Rivera

Proactive bottleneck performance analysis in parallel computing using openMP

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

Generalizing Hierarchical Parallelism

Since the days of OpenMP 1.0 computer hardware has become more complex, typically by specializing compute units for coarse- and fine-grained parallelism in incrementally deeper hierarchies of parallelism. Newer versions of OpenMP reacted by…

Programming Languages · Computer Science 2023-09-06 Michael Kruse

Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems

Task parallelism as employed by the OpenMP task construct or some Intel Threading Building Blocks (TBB) components, although ideal for tackling irregular problems or typical producer/consumer schemes, bears some potential for performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-01-04 Markus Wittmann , Georg Hager

Parallelize Bubble Sort Algorithm Using OpenMP

Sorting has been a profound area for the algorithmic researchers and many resources are invested to suggest more works for sorting algorithms. For this purpose, many existing sorting algorithms were observed in terms of the efficiency of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-25 Zaid Abdi Alkareem Alyasseri , Kadhim Al-Attar , Mazin Nasser

Synch: A framework for concurrent data-structures and benchmarks

The recent advancements in multicore machines highlight the need to simplify concurrent programming in order to leverage their computational power. One way to achieve this is by designing efficient concurrent data structures (e.g. stacks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-31 Nikolaos D. Kallimanis

Asynchronous Runtime with Distributed Manager for Task-based Programming Models

Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-09 Jaume Bosch , Carlos Álvarez , Daniel Jiménez-González , Xavier Martorell , Eduard Ayguadé

Experiences with task-based programming using cluster nodes as OpenMP devices

Programming a distributed system, such as a cluster, requires extended use of low-level communication libraries and can often become cumbersome and error prone for the average developer. In this work, we consider each node of a cluster as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-24 Ilias Keftakis , Vassilios V. Dimakopoulos

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-29 Jonas H. Müller Korndörfer , Ahmed Eleliemy , Ali Mohammed , Florina M. Ciorba

OpenMP Parallelization of Dynamic Programming and Greedy Algorithms

Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Claude Tadonki

Learning-based Dynamic Pinning of Parallelized Applications in Many-Core Systems

Motivated by the need for adaptive, secure and responsive scheduling in a great range of computing applications, including human-centered and time-critical applications, this paper proposes a scheduling framework that seamlessly adds…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-14 Georgios C. Chasparis , Vladimir Janjic , Michael Rossbory

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors

Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Juan Carlos Saez , Fernando Castro , Manuel Prieto-Matias

Optimizing the hybrid parallelization of BHAC

We present our experience with the modernization on the GR-MHD code BHAC, aimed at improving its novel hybrid (MPI+OpenMP) parallelization scheme. In doing so, we showcase the use of performance profiling tools usable on x86 (Intel-based)…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-30 Salvatore Cielo , Oliver Porth , Luigi Iapichino , Anupam Karmakar , Hector Olivares , Chun Xia

MLP Aware Scheduling Techniques in Multithreaded Processors

Major chip manufacturers have all introduced Multithreaded processors. These processors are used for running a variety of workloads. Efficient resource utilization is an important design aspect in such processors. Particularly, it is…

Performance · Computer Science 2019-08-13 Murthy Durbhakula

Analysis and Characterization of Performance Variability for OpenMP Runtime

In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-10 Minyu Cui , Nikela Papadopoulou , Miquel Pericàs

Effect of Thread Level Parallelism on the Performance of Optimum Architecture for Embedded Applications

According to the increasing complexity of network application and internet traffic, network processor as a subset of embedded processors have to process more computation intensive tasks. By scaling down the feature size and emersion of chip…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Hojjat Taghdisi

FastFlow: Efficient Parallel Streaming Applications on Multi-core

Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-09-10 Marco Aldinucci , Massimo Torquati , Massimiliano Meneghin

Enabling OpenMP Task Parallelism on Multi-FPGAs

FPGA-based hardware accelerators have received increasing attention mainly due to their ability to accelerate deep pipelined applications, thus resulting in higher computational performance and energy efficiency. Nevertheless, the amount of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-23 R. Nepomuceno , R. Sterle , G. Valarini , M. Pereira , H. Yviquel , G. Araujo