Related papers: Large Scale Parallelization Using File-Based Commu…

Improving the Performance and Resilience of MPI Parallel Jobs with Topology and Fault-Aware Process Placement

HPC systems keep growing in size to meet the ever-increasing demand for performance and computational resources. Apart from increased performance, large scale systems face two challenges that hinder further growth: energy efficiency and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-06 Ioannis Vardas , Manolis Ploumidis , Manolis Marazakis

A local parallel communication algorithm for polydisperse rigid body dynamics

The simulation of large ensembles of particles is usually parallelized by partitioning the domain spatially and using message passing to communicate between the processes handling neighboring subdomains. The particles are represented as…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-03 Sebastian Eibl , Ulrich Rüde

MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems

MPI implementations commonly rely on explicit memory-copy operations, incurring overhead from redundant data movement and buffer management. This overhead notably impacts HPC workloads involving intensive inter-processor communication. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-17 Miryeong Kwon , Donghyun Gouk , Hyein Woo , Junhee Kim , Jinwoo Baek , Kyungkuk Nam , Sangyoon Ji , Jiseon Kim , Hanyeoreum Bae , Junhyeok Jang , Hyunwoo You , Junseok Moon , Myoungsoo Jung

Open-MPI over MOSIX: paralleled computing in a clustered world

Recent increased interest in Cloud computing emphasizes the need to find an adequate solution to the load-balancing problem in parallel computing -- efficiently running several jobs concurrently on a cluster of shared computers (nodes). One…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-02 Adam Lev-Libfeld , Alex Margolin , Amnon Barak

To Parallelize or Not to Parallelize, Speed Up Issue

Running parallel applications requires special and expensive processing resources to obtain the required results within a reasonable time. Before parallelizing serial applications, some analysis is recommended to be carried out to decide…

Software Engineering · Computer Science 2011-03-30 Alaa Ismail Elnashar

A Novel Process Mapping Strategy in Clustered Environments

Nowadays the number of available processing cores within computing nodes which are used in recent clustered environments, are growing up with a rapid rate. Despite this trend, the number of available network interfaces in such computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-13 Mohsen Soryani , Morteza Analoui , Ghobad Zarrinchian

A parallel dual-grid multiscale approach to CFD-DEM couplings

In this work, a new parallel dual-grid multiscale approach for CFD-DEM couplings is investigated. Dual- grid multiscale CFD-DEM couplings have been recently developed and successfully adopted in different applications still, an efficient…

Computational Engineering, Finance, and Science · Computer Science 2018-12-26 Gabriele Pozzetti , Hrvoje Jasak , Xavier Besseron , Alban Rousset , Bernhard Peters

Transaction Level Analysis for a Clustered and Hardware-Enhanced Task Manager on Homogeneous Many-Core Systems

The increasing parallelism of many-core systems demands for efficient strategies for the run-time system management. Due to the large number of cores the management overhead has a rising impact to the overall system performance. This work…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-11 Daniel Gregorek , Robert Schmidt , Alberto Garcia-Ortiz

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Experiences Porting Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-study

Parallel algorithms relying on synchronous parallelization libraries often experience adverse performance due to global synchronization barriers. Asynchronous many-task runtimes offer task futurization capabilities that minimize or remove…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-05 Alexander Strack , Christopher Taylor , Patrick Diehl , Dirk Pflüger

ISO: Overlap of Computation and Communication within Seqenence For LLM Inference

In the realm of Large Language Model (LLM) inference, the inherent structure of transformer models coupled with the multi-GPU tensor parallelism strategy leads to a sequential execution of computation and communication. This results in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Bin Xiao , Lei Su

A More Scalable Sparse Dynamic Data Exchange

Parallel architectures are continually increasing in performance and scale, while underlying algorithmic infrastructure often fail to take full advantage of available compute power. Within the context of MPI, irregular communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-04 Andrew Geyko , Gerald Collom , Derek Schafer , Patrick Bridges , Amanda Bienz

Supporting Parallelism in Server-based Multiprocessor Systems

Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-06-15 Luís Nogueira , Luís Miguel Pinho

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Distributed Rate Scaling in Large-Scale Service Systems

We consider a large-scale parallel-server system, where each server independently adjusts its processing speed in a decentralized manner. The objective is to minimize the overall cost, which comprises the average cost of maintaining the…

Optimization and Control · Mathematics 2023-06-06 Daan Rutten , Martin Zubeldia , Debankur Mukherjee

Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication

Next-generation wireless technologies (for immersive-massive communication, joint communication and sensing) demand highly parallel architectures for massive data processing. A common architectural template scales up by grouping tens to…

Hardware Architecture · Computer Science 2025-07-08 Samuel Riedel , Yichao Zhang , Marco Bertuletti , Luca Benini

Performance Evaluation of Parallel Message Passing and Thread Programming Model on Multicore Architectures

The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-12-13 D. T. Hasta , A. B. Mutiara

GPU-centric Communication Schemes for HPC and ML Applications

Compute nodes on modern heterogeneous supercomputing systems comprise CPUs, GPUs, and high-speed network interconnects (NICs). Parallelization is identified as a technique for effectively utilizing these systems to execute scalable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Naveen Namashivayam

Rethinking Inter-Process Communication with Memory Operation Offloading

As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading,…

Operating Systems · Computer Science 2026-01-13 Misun Park , Richi Dubey , Yifan Yuan , Nam Sung Kim , Ada Gavrilovska

Scaling Ordered Stream Processing on Shared-Memory Multicores

Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…

Databases · Computer Science 2018-04-02 Guna Prasaad , G. Ramalingam , Kaushik Rajan