English

STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Computation and Language 2025-04-08 v1

Abstract

Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model parameters. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.

Keywords

Cite

@article{arxiv.2504.04151,
  title  = {STEP: Staged Parameter-Efficient Pre-training for Large Language Models},
  author = {Kazuki Yano and Takumi Ito and Jun Suzuki},
  journal= {arXiv preprint arXiv:2504.04151},
  year   = {2025}
}

Comments

Accepted to NAACL 2025 Main

R2 v1 2026-06-28T22:48:04.815Z