Related papers: CkIO: Parallel File Input for Over-Decomposed Task…

Efficient and Portable Support for Overdecomposition on Distributed Memory GPGPU Platforms

Overdecomposition has emerged as a powerful and sometimes essential technique in parallel programming. Many application domains or frameworks, including those based on adaptive mesh refinements, or tree codes use it. Charm++ is a parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-14 Aditya Bhosale , Anant Jain , Shourya Goel , Ritvik Rao , Peddoju Sateesh Kumar , Laxmikant Kale

Introducing the Task-Aware Storage I/O (TASIO) Library

Task-based programming models are excellent tools to parallelize and seamlessly load balance an application workload. However, the integration of I/O intensive applications and task-based programming models is lacking. Typically, I/O…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Aleix Roca Nonell , Vicenç Beltran Querol , Sergi Mateo Bellido

Towards an Adaptive Runtime System for Cloud-Native HPC

The ongoing convergence of HPC and cloud computing presents a fundamental challenge: HPC applications, designed for static and homogeneous supercomputers, are ill-suited for the dynamic, heterogeneous, and volatile nature of the cloud.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-17 Aditya Bhosale , Advait Tahilyani , Laxmikant Kale , Sara Kokkila-Schumacher

Towards Enabling I/O Awareness in Task-based Programming Models

Storage systems have not kept the same technology improvement rate as computing systems. As applications produce more and more data, I/O becomes the limiting factor for increasing application performance. I/O congestion caused by concurrent…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-03 Hatem Elshazly , Jorge Ejarque , Francesc Lordan , Rosa M. Badia

Optimizing Noncontiguous Accesses in MPI-IO

The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Rajeev Thakur , William Gropp , Ewing Lusk

Design and Development of a Java Parallel I/O Library

Parallel I/O refers to the ability of scientific programs to concurrently read/write from/to a single file from multiple processes executing on distributed memory platforms like compute clusters. In the HPC world, I/O becomes a significant…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Muhammad Sohaib Ayub , Muhammad Adnan , Muhammad Yasir Shafi

Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey

Driven by artificial intelligence, data science, and high-resolution simulations, I/O workloads and hardware on high-performance computing (HPC) systems have become increasingly complex. This complexity can lead to large I/O overheads and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-03 Hammad Ather , Jean Luca Bez , Chen Wang , Hank Childs , Allen D. Malony , Suren Byna

Preparing HPC Applications for the Exascale Era: A Decoupling Strategy

Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-07 Ivy Bo Peng , Roberto Gioiosa , Gokcen Kestor , Erwin Laure , Stefano Markidis

A Task-Parallel Approach for Localized Topological Data Structures

Unstructured meshes are characterized by data points irregularly distributed in the Euclidian space. Due to the irregular nature of these data, computing connectivity information between the mesh elements requires much more time and memory…

Data Structures and Algorithms · Computer Science 2025-04-03 Guoxi Liu , Federico Iuricich

A Comparative Study of Asynchronous Many-Tasking Runtimes: Cilk, Charm++, ParalleX and AM++

We evaluate and compare four contemporary and emerging runtimes for high-performance computing(HPC) applications: Cilk, Charm++, ParalleX and AM++. We compare along three bases: programming model, execution model and the implementation on…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-02 Abhishek Kulkarni , Andrew Lumsdaine

Quantifying Overheads in Charm++ and HPX using Task Bench

Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core architectures with light-weight threads, asynchronous executions, and smart scheduling. In this paper, we present the comparison of the AMT systems Charm++ and HPX…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-04 Nanmiao Wu , Ioannis Gonidelis , Simeng Liu , Zane Fink , Nikunj Gupta , Karame Mohammadiporshokooh , Patrick Diehl , Hartmut Kaiser , Laxmikant V. Kale

Problems in Modern High Performance Parallel I/O Systems

In the past couple of decades, the computational abilities of supercomput- ers have increased tremendously. Leadership scale supercomputers now are capable of petaflops. Likewise, the problem size targeted by applications running on such…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-06 Robert Louis Cloud

ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment

Applications in science and engineering often require huge computational resources for solving problems within a reasonable time frame. Parallel supercomputers provide the computational infrastructure for solving such problems. A…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Rajesh Sudarsan , Calvin J. Ribbens

Dynamic Load Balancing in GPU-Based Systems - Early Experiments

The dynamic load-balancing framework in Charm++/AMPI, developed at the University of Illinois, is based on using processor virtualization to allow thread migration across processors. This framework has been successfully applied to many…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-17 Alvaro Luiz Fazenda , Celso L. Mendes , Laxmikant V. Kale , Jairo Panetta , Eduardo Rocha Rodrigues

ViPIOS - VIenna Parallel Input Output System: Language, Compiler and Advanced Data Structure Support for Parallel I/O Operations

For an increasing number of data intensive scientific applications, parallel I/O concepts are a major performance issue. Tackling this issue, we develop an input/output system designed for highly efficient, scalable and conveniently usable…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-06 Erich Schikuta , Helmut Wanek , Heinz Stockinger , Kurt Stockinger , Thomas Fürle , Oliver Jorns , Christoph Löffelhardt , Peter Brezany , Minh Dang , Thomas Mück

DeCo: Defect-Aware Modeling with Contrasting Matching for Optimizing Task Assignment in Online IC Testing

In the semiconductor industry, integrated circuit (IC) processes play a vital role, as the rising complexity and market expectations necessitate improvements in yield. Identifying IC defects and assigning IC testing tasks to the right…

Artificial Intelligence · Computer Science 2025-06-04 Lo Pang-Yun Ting , Yu-Hao Chiang , Yi-Tung Tsai , Hsu-Chao Lai , Kun-Ta Chuang

ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems

Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-15 Yiheng Xu , Pranav Sivaraman , Hariharan Devarajan , Kathryn Mohror , Abhinav Bhatele

The OpenMP Cluster Programming Model

Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-16 Hervé Yviquel , Marcio Pereira , Emílio Francesquini , Guilherme Valarini , Gustavo Leite , Pedro Rosso , Rodrigo Ceccato , Carla Cusihualpa , Vitoria Dias , Sandro Rigo , Alan Souza , Guido Araujo

Improving the Performance and Resilience of MPI Parallel Jobs with Topology and Fault-Aware Process Placement

HPC systems keep growing in size to meet the ever-increasing demand for performance and computational resources. Apart from increased performance, large scale systems face two challenges that hinder further growth: energy efficiency and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-06 Ioannis Vardas , Manolis Ploumidis , Manolis Marazakis

Cache-Conscious Run-time Decomposition of Data Parallel Computations

Multi-core architectures feature an intricate hierarchy of cache memories, with multiple levels and sizes. To adequately decompose an application according to the traits of a particular memory hierarchy is a cumbersome task that may be…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-20 Hervé Paulino , Nuno Delgado