Related papers: Proactive bottleneck performance analysis in paral…

Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-06 Ayesha Afzal , Georg Hager , Stefano Markidis , Gerhard Wellein

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors

With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-13 Michele Weiland , Lawrence Mitchell , Gerard Gorman , Stephan Kramer , Mark Parsons , James Southern

Towards Efficient OpenMP Strategies for Non-Uniform Architectures

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan

Analysis and Characterization of Performance Variability for OpenMP Runtime

In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-10 Minyu Cui , Nikela Papadopoulou , Miquel Pericàs

Performance Evaluation of Parallel Message Passing and Thread Programming Model on Multicore Architectures

The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-13 D. T. Hasta , A. B. Mutiara

Machine Learning Framwork for Performance Anomaly in OpenMP Multi-Threaded Systems

Some OpenMP multi-threaded applications increasingly suffer from performance anomaly owning to shared resource contention as well as software- and hardware-related problems. Such performance anomaly can result in failure and inefficiencies,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-06 Weidong Wang , Wangda Luo

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors

Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Juan Carlos Saez , Fernando Castro , Manuel Prieto-Matias

Benchmarking mixed-mode PETSc performance on high-performance architectures

The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-07-19 Michael Lange , Gerard Gorman , Michele Weiland , Lawrence Mitchell , Xiaohu Guo , James Southern

Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores

Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-03 Denis Los , Igor Petushkov

Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption

Many important computational problems require utilization of high performance computing (HPC) systems that consist of multi-level structures combining higher and higher numbers of devices with various characteristics. Utilizing full power…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Paweł Rościszewski

Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems

This article features extended summaries and retrospectives of some of the recent research done by our research group, SAFARI, on (1) various critical problems in memory systems and (2) how memory system bottlenecks affect graphics…

Hardware Architecture · Computer Science 2018-05-30 Onur Mutlu , Saugata Ghose , Rachata Ausavarungnirun

Energy-Efficiency Evaluation of OpenMP Loop Transformations and Runtime Constructs

OpenMP is the de facto API for parallel programming in HPC applications. These programs are often computed in data centers, where energy consumption is a major issue. Whereas previous work has focused almost entirely on performance, we here…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-12 Henrik Valter , Axel Karlsson , Miquel Pericàs

Performance Evaluation of Parallel Algorithms

Evaluating how well a whole system or set of subsystems performs is one of the primary objectives of performance testing. We can tell via performance assessment if the architecture implementation meets the design objectives. Performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-15 Donald Ene Vincent Ike Anireh

Real-Time Parallel Programming: State of Play and Open Issues

Real-time systems applications usually consist of a set of concurrent activities with timing-related properties. Developing these applications requires programming paradigms that can effectively handle the specification of concurrent…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-21 Luis Miguel Pinho

The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-24 Kamran Karimi

DepGraph: Localizing Performance Bottlenecks in Multi-Core Applications Using Waiting Dependency Graphs and Software Tracing

This paper addresses the challenge of understanding the waiting dependencies between the threads and hardware resources required to complete a task. The objective is to improve software performance by detecting the underlying bottlenecks…

Software Engineering · Computer Science 2021-03-09 Naser Ezzati-Jivan , Quentin Fournier , Michel R. Dagenais , Abdelwahab Hamou-Lhadj

Towards Autotuning of OpenMP Applications on Multicore Architectures

In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-17 Jakub Katarzyński , Maciej Cytowski

Frustrated with MPI+Threads? Try MPIxThreads!

MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-31 Hui Zhou , Ken Raffenetti , Junchao Zhang , Yanfei Guo , Rajeev Thakur

Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems

Message Passing Interface (MPI) is widely used to implement parallel programs. Although Windowsbased architectures provide the facilities of parallel execution and multi-threading, little attention has been focused on using MPI on these…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-05-31 Alaa Ismail Elnashar