Related papers: Callback-based Completion Notification using MPI C…

Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-23 Joseph Schuchart , Christoph Niethammer , José Gracia

Extending the Message Passing Interface (MPI) with User-Level Schedules

Composability is one of seven reasons for the long-standing and continuing success of MPI. Extending MPI by composing its operations with user-level operations provides useful integration with the progress engine and completion notification…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-27 Derek Schafer , Sheikh Ghafoor , Daniel Holmes , Martin Ruefenacht , Anthony Skjellum

MPI Progress For All

The progression of communication in the Message Passing Interface (MPI) is not well defined, yet it is critical for application performance, particularly in achieving effective computation and communication overlap. The opaque nature of MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-16 Hui Zhou , Robert Latham , Ken Raffenetti , Yanfei Guo , Rajeev Thakur

Examining MPI and its Extensions for Asynchronous Multithreaded Communication

The increasing complexity of HPC architectures and the growing adoption of irregular scientific algorithms demand efficient support for asynchronous, multithreaded communication. This need is especially pronounced with Asynchronous…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-27 Jiakun Yan , Marc Snir , Yanfei Guo

PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines

The Message Passing Interface (MPI) is the most commonly used application programming interface for process communication on current large-scale parallel systems. Due to the scale and complexity of modern parallel architectures, it is…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-09-05 Sascha Hunold , Alexandra Carpen-Amarie , Felix Donatus Lübbe , Jesper Larsson Träff

MPI Advance : Open-Source Message Passing Optimizations

The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-15 Amanda Bienz , Derek Schafer , Anthony Skjellum

Performance Evaluation of Parallel Message Passing and Thread Programming Model on Multicore Architectures

The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-13 D. T. Hasta , A. B. Mutiara

Frustrated with MPI+Threads? Try MPIxThreads!

MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-31 Hui Zhou , Ken Raffenetti , Junchao Zhang , Yanfei Guo , Rajeev Thakur

PartRePer-MPI: Combining Fault Tolerance and Performance for MPI Applications

As we have entered Exascale computing, the faults in high-performance systems are expected to increase considerably. To compensate for a higher failure rate, the standard checkpoint/restart technique would need to create checkpoints at a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-26 Sarthak Joshi , Sathish Vadhiyar

Integrating Blocking and Non-Blocking MPI Primitives with Task-Based Programming Models

In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-blocking MPI primitives with task-based programming models. The TAMPI library leverages two new runtime APIs to improve both programmability…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Kevin Sala , Xavier Teruel , Josep M. Perez , Antonio J. Peña , Vicenç Beltran , Jesus Labarta

Collectives in hybrid MPI+MPI code: design, practice and performance

The use of hybrid scheme combining the message passing programming models for inter-node parallelism and the shared memory programming models for node-level parallelism is widely spread. Existing extensive practices on hybrid Message…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-23 Huan Zhou , Jose Gracia , Naweiluo Zhou , Ralf Schneider

Implementing Efficient Message Logging Protocols as MPI Application Extensions

Message logging protocols are enablers of local rollback, a more efficient alternative to global rollback, for fault tolerant MPI applications. Until now, message logging MPI implementations have incurred the overheads of a redesign and…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-09 Kiril Dichev , Dimitrios S. Nikolopoulos

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-15 Huan Zhou , Jose Gracia , Ralf Schneider

Learning from the Success of MPI

The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 William D. Gropp

MPI Implementation Profiling for Better Application Performance

While application profiling has been a mainstay in the HPC community for years, profiling of MPI and other communication middleware has not received the same degree of exploration. This paper adds to the discussion of MPI profiling,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-20 Riley Shipley , Garrett Hooten , David Boehme , Derek Schafer , Anthony Skjellum , Olga Pearce

Designing and Prototyping Extensions to MPI in MPICH

As HPC system architectures and the applications running on them continue to evolve, the MPI standard itself must evolve. The trend in current and future HPC systems toward powerful nodes with multiple CPU cores and multiple GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-20 Hui Zhou , Ken Raffenetti , Yanfei Guo , Thomas Gillis , Robert Latham , Rajeev Thakur

Lessons Learned on MPI+Threads Communication

Hybrid MPI+threads programming is gaining prominence, but, in practice, applications perform slower with it compared to the MPI everywhere model. The most critical challenge to the parallel efficiency of MPI+threads applications is slow…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-30 Rohit Zambre , Aparna Chandramowlishwaran

Extending Message Passing Interface Windows to Storage

This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-28 Sergio Rivas-Gomez , Stefano Markidis , Ivy Bo Peng , Erwin Laure , Gokcen Kestor , Roberto Gioiosa

Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication

The MPI standard has long included one-sided communication abstractions through the MPI Remote Memory Access (RMA) interface. Unfortunately, the MPI RMA chapter in the 4.0 version of the MPI standard still contains both well-known and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-17 Joseph Schuchart , Christoph Niethammer , José Gracia , George Bosilca

A Hybrid Parallelization of AIM for Multi-Core Clusters: Implementation Details and Benchmark Results on Ranger

This paper presents implementation details and empirical results for a hybrid message passing and shared memory paralleliziation of the adaptive integral method (AIM). AIM is implemented on a (near) petaflop supercomputing cluster of…

Computational Engineering, Finance, and Science · Computer Science 2010-10-08 Fangzhou Wei , Ali E. Yılmaz