English

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Distributed, Parallel, and Cluster Computing 2026-04-21 v2 Artificial Intelligence Machine Learning Performance

Abstract

Memory pressure has emerged as a dominant constraint in scaling the training of large language models (LLMs), particularly in resource-constrained environments. While modern frameworks incorporate various memory-saving techniques, they often expose low-level configuration knobs that require manual tuning and specialized system expertise. This not only adds engineering overhead but also risks suboptimal hardware utilization when misconfigured. This paper introduces ProTrain, a novel training system that automatically tailors memory management policies to the model architecture and underlying hardware resources, eliminating the need for manual intervention. The core of ProTrain is its automated memory management that abstracts complex memory management strategies into a few tunable configuration parameters, allowing searches for optimal parameter settings using cost models. ProTrain is equipped with a runtime profiler that provides precise estimates of latency, memory usage, and I/O bandwidth to build high-fidelity cost models. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43×\times to 2.71×\times compared to the state-of-the-art training systems.

Keywords

Cite

@article{arxiv.2406.08334,
  title  = {ProTrain: Efficient LLM Training via Memory-Aware Techniques},
  author = {Hanmei Yang and Jin Zhou and Yao Fu and Xiaoqun Wang and Ramine Roane and Hui Guan and Tongping Liu},
  journal= {arXiv preprint arXiv:2406.08334},
  year   = {2026}
}

Comments

Accepted to MLSys 2026