English

Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization

Distributed, Parallel, and Cluster Computing 2025-09-30 v1 Artificial Intelligence Machine Learning

Abstract

This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining needs. Evaluated on Google Cluster Data, the model achieves over 99% accuracy, reducing computational overhead and improving scheduling latency for constrained tasks. This scalable solution enables real-time optimization, advancing machine learning integration in cluster management and paving the way for future adaptive scheduling strategies.

Keywords

Cite

@article{arxiv.2509.22701,
  title  = {Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization},
  author = {Leszek Sliwko and Jolanta Mizera-Pietraszko},
  journal= {arXiv preprint arXiv:2509.22701},
  year   = {2025}
}

Comments

This is the accepted version of the paper published in 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). The final version is available at: https://doi.org/10.1109/IPDPSW66978.2025.00056