Related papers: Periodic I/O scheduling for super-computers

Job Scheduling in High Performance Computing

The ever-growing processing power of supercomputers in recent decades enables us to explore increasing complex scientific problems. Effective scheduling these jobs is crucial for individual job performance and system efficiency. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-21 Yuping Fan

Mitigating Shared Storage Congestion Using Control Theory

Efficient data access in High-Performance Computing (HPC) systems is essential to the performance of intensive computing tasks. Traditional optimizations of the I/O stack aim to improve peak performance but are often workload specific and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Thomas Collignon , Kouds Halitim , Raphaël Bleuse , Sophie Cerf , Bogdan Robu , Éric Rutten , Lionel Seinturier , Alexandre van Kempen

Scheduling Jobs with Random Resource Requirements in Computing Clusters

We consider a natural scheduling problem which arises in many distributed computing frameworks. Jobs with diverse resource requirements (e.g. memory requirements) arrive over time and must be served by a cluster of servers, each with a…

Networking and Internet Architecture · Computer Science 2019-01-21 Konstantinos Psychas , Javad Ghaderi

Probabilistic Scheduling of Dynamic I/O Requests via Application Clustering for Burst-Buffer Equipped HPC

Burst-Buffering is a promising storage solution that introduces an intermediate highthroughput storage buffer layer to mitigate the I/O bottleneck problem that the current High-Performance Computing (HPC) platforms suffer. The existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-17 Benbo Zha , Hong Shen

Centralized Congestion Control and Scheduling in a Datacenter

We consider the problem of designing a packet-level congestion control and scheduling policy for datacenter networks. Current datacenter networks primarily inherit the principles that went into the design of Internet, where congestion…

Networking and Internet Architecture · Computer Science 2017-10-10 Devavrat Shah , Qiaomin Xie

Scheduling Beyond CPUs for HPC

High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-11 Yuping Fan , Zhiling Lan , Paul Rich , William E. Allcock , Michael E. Papka , Brian Austin , David Paul

Scheduling for a Processor Sharing System with Linear Slowdown

We consider the problem of scheduling arrivals to a congestion system with a finite number of users having identical deterministic demand sizes. The congestion is of the processor sharing type in the sense that all users in the system at…

Optimization and Control · Mathematics 2017-04-12 Liron Ravner , Yoni Nazarathy

Effective Handling of Urgent Jobs - Speed Up Scheduling for Computing Applications

A queue is required when a service provider is not able to handle jobs arriving over the time. In a highly flexible and dynamic environment, some jobs might demand for faster execution at run-time especially when the resources are limited…

Performance · Computer Science 2015-03-24 Yash Gupta , Kamalakar Karlapalem

A Case Study: Task Scheduling Methodologies for High Speed Computing Systems

High Speed computing meets ever increasing real-time computational demands through the leveraging of flexibility and parallelism. The flexibility is achieved when computing platform designed with heterogeneous resources to support…

Operating Systems · Computer Science 2015-01-08 Mahendra Vucha , Arvind Rajawat

Dynamic scheduling of virtual machines running hpc workloads in scientific grids

The primary motivation for uptake of virtualization has been resource isolation, capacity management and resource customization allowing resource providers to consolidate their resources in virtual machines. Various approaches have been…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-09-27 Omer Khalid , Ivo Maljevic , Richard Anthony , Miltos Petridis , Kevin Parrot , Markus Schulz

Co-Scheduling Algorithms for High-Throughput Workload Execution

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…

Data Structures and Algorithms · Computer Science 2013-05-01 Guillaume Aupy , Manu Shantharam , Anne Benoit , Yves Robert , Padma Raghavan

Scheduling Parallel-Task Jobs Subject to Packing and Placement Constraints

Motivated by modern parallel computing applications, we consider the problem of scheduling parallel-task jobs with heterogeneous resource requirements in a cluster of machines. Each job consists of a set of tasks that can be processed in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-03 Mehrnoosh Shafiee , Javad Ghaderi

Mixed-Criticality Scheduling with I/O

This paper addresses the problem of scheduling tasks with different criticality levels in the presence of I/O requests. In mixed-criticality scheduling, higher criticality tasks are given precedence over those of lower criticality when it…

Operating Systems · Computer Science 2016-03-15 Eric Missimer , Katherine Zhao , Richard West

A HPC Co-Scheduler with Reinforcement Learning

Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-19 Abel Souza , Kristiaan Pelckmans , Johan Tordsson

Improving Overhead Computation and pre-processing Time for Grid Scheduling System

Computational Grid is enormous environments with heterogeneous resources and stable infrastructures among other Internet-based computing systems. However, the managing of resources in such systems has its special problems. Scheduler systems…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-05-07 Asgarali Bouyer , Mohammad Javad hoseyni , Abdul Hanan Abdullah

Rethinking Inter-Process Communication with Memory Operation Offloading

As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading,…

Operating Systems · Computer Science 2026-01-13 Misun Park , Richi Dubey , Yifan Yuan , Nam Sung Kim , Ada Gavrilovska

Scheduling Coflows in Multi-Core OCS Networks with Performance Guarantee

Coflow provides a key application-layer abstraction for capturing communication patterns, enabling the efficient coordination of parallel data flows to reduce job completion times in distributed systems. Modern data center networks (DCNs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-10 Xin Wang , Hong Shen , Hui Tian , Dong Wang

Capturing Periodic I/O Using Frequency Techniques

Many HPC applications perform their I/O in bursts that follow a periodic pattern. This allows for making predictions as to when a burst occurs. System providers can take advantage of such knowledge to reduce file-system contention by…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-17 Ahmad Tarraf , Alexis Bandet , Francieli Boito , Guillaume Pallez , Felix Wolf

Hybrid Workload Scheduling on HPC Systems

Traditionally, on-demand, rigid, and malleable applications have been scheduled and executed on separate systems. The ever-growing workload demands and rapidly developing HPC infrastructure trigger the interest of converging these…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-14 Yuping Fan , Paul Rich , William Allcock , Michael Papka , Zhiling Lan

Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization

This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Leszek Sliwko , Jolanta Mizera-Pietraszko