Related papers: Towards Enabling I/O Awareness in Task-based Progr…

Introducing the Task-Aware Storage I/O (TASIO) Library

Task-based programming models are excellent tools to parallelize and seamlessly load balance an application workload. However, the integration of I/O intensive applications and task-based programming models is lacking. Typically, I/O…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Aleix Roca Nonell , Vicenç Beltran Querol , Sergi Mateo Bellido

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-24 Jing Chen , Pirah Noor Soomro , Mustafa Abduljabbar , Madhavan Manivannan , Miquel Pericas

ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems

Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-15 Yiheng Xu , Pranav Sivaraman , Hariharan Devarajan , Kathryn Mohror , Abhinav Bhatele

Tools for Analyzing Parallel I/O

Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-19 Julian M. Kunkel , Eugen Betke , Matt Bryson , Philip Carns , Rosemary Francis , Wolfgang Frings , Roland Laifer , Sandra Mendez

Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application

The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-25 Juliana M. N. Silva , Cristina Boeres , Lúcia M. A. Drummond , Artur A. Pessoa

Mitigating Shared Storage Congestion Using Control Theory

Efficient data access in High-Performance Computing (HPC) systems is essential to the performance of intensive computing tasks. Traditional optimizations of the I/O stack aim to improve peak performance but are often workload specific and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Thomas Collignon , Kouds Halitim , Raphaël Bleuse , Sophie Cerf , Bogdan Robu , Éric Rutten , Lionel Seinturier , Alexandre van Kempen

Reduced I/O Latency with Futures

Task parallelism research has traditionally focused on optimizing computation-intensive applications. Due to the proliferation of commodity parallel processors, there has been recent interest in supporting interactive applications. Such…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-20 Kyle Singer , Kunal Agrawal , I-Ting Angelina Lee

Providing High and Controllable Performance in Multicore Systems Through Shared Resource Management

Multiple applications executing concurrently on a multicore system interfere with each other at different shared resources such as main memory and shared caches. Such inter-application interference, if uncontrolled, results in high system…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-08-14 Lavanya Subramanian

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

Periodic I/O scheduling for super-computers

With the ever-growing need of data in HPC applications, the congestion at the I/O level becomes critical in super-computers. Architectural enhancement such as burst-buffers and pre-fetching are added to machines, but are not sufficient to…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-23 Guillaume Aupy , Ana Gainaru , Valentin Le Fèvre

Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey

Driven by artificial intelligence, data science, and high-resolution simulations, I/O workloads and hardware on high-performance computing (HPC) systems have become increasingly complex. This complexity can lead to large I/O overheads and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-03 Hammad Ather , Jean Luca Bez , Chen Wang , Hank Childs , Allen D. Malony , Suren Byna

Proactive bottleneck performance analysis in parallel computing using openMP

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach

This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-10 Peng Zhang , Jianbin Fang , Canqun Yang , Chun Huang , Tao Tang , Zheng Wang

On the Benefits of Anticipating Load Imbalance for Performance Optimization of Parallel Applications

In parallel iterative applications, computational efficiency is essential for addressing large problems. Load imbalance is one of the major performance degradation factors of parallel applications. Therefore, distributing, cleverly, and as…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-18 Anthony Boulmier , Franck Raynaud , Nabil Abdennadher , Bastien Chopard

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

Problems in Modern High Performance Parallel I/O Systems

In the past couple of decades, the computational abilities of supercomput- ers have increased tremendously. Leadership scale supercomputers now are capable of petaflops. Likewise, the problem size targeted by applications running on such…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-06 Robert Louis Cloud

CkIO: Parallel File Input for Over-Decomposed Task-Based Systems

Parallel input performance issues are often neglected in large scale parallel applications in Computational Science and Engineering. Traditionally, there has been less focus on input performance because either input sizes are small (as in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-02 Mathew Jacob , Maya Taylor , Laxmikant Kale

Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization

Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and…

Performance · Computer Science 2025-12-22 Karthik Prabhakar , Durgamadhab Mishra

Tightening I/O Lower Bounds through the Hourglass Dependency Pattern

When designing an algorithm, one cares about arithmetic/computational complexity, but data movement (I/O) complexity plays an increasingly important role that highly impacts performance and energy consumption. For a given algorithm and a…

Computational Complexity · Computer Science 2024-04-26 Lionel Eyraud-Dubois , Guillaume Iooss , Julien Langou , Fabrice Rastello

CAWL: A Cache-aware Write Performance Model of Linux Systems

The performance of data intensive applications is often dominated by their input/output (I/O) operations but the I/O stack of systems is complex and severely depends on system specific settings and hardware components. This situation makes…

Performance · Computer Science 2023-06-12 Masoud Gholami , Florian Schintke