English

Memory Efficient Mixed-Precision Optimizers

Machine Learning 2023-09-25 v1

Abstract

Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both single and half-precision floating point arithmetic to reduce memory requirements while maintaining model accuracy. We provide here an algorithm to further reduce memory usage during the training of a model by getting rid of the floating point copy of the parameters, virtually keeping only half-precision numbers. We also explore the benefits of getting rid of the gradient's value by executing the optimizer step during the back-propagation. In practice, we achieve up to 25% lower peak memory use and 15% faster training while maintaining the same level of accuracy.

Keywords

Cite

@article{arxiv.2309.12381,
  title  = {Memory Efficient Mixed-Precision Optimizers},
  author = {Basile Lewandowski and Atli Kosson},
  journal= {arXiv preprint arXiv:2309.12381},
  year   = {2023}
}