Minibatching Offers Improved Generalization Performance for Second Order Optimizers

Eric Silk; Swarnita Chakraborty; Nairanjana Dasgupta; Anand D. Sarwate; Andrew Lumsdaine; Tony Chiang

Minibatching Offers Improved Generalization Performance for Second Order Optimizers

Machine Learning 2023-07-24 v1

Authors: Eric Silk , Swarnita Chakraborty , Nairanjana Dasgupta , Anand D. Sarwate , Andrew Lumsdaine , Tony Chiang

Abstract

Training deep neural networks (DNNs) used in modern machine learning is computationally expensive. Machine learning scientists, therefore, rely on stochastic first-order methods for training, coupled with significant hand-tuning, to obtain good performance. To better understand performance variability of different stochastic algorithms, including second-order methods, we conduct an empirical study that treats performance as a response variable across multiple training sessions of the same model. Using 2-factor Analysis of Variance (ANOVA) with interactions, we show that batch size used during training has a statistically significant effect on the peak accuracy of the methods, and that full batch largely performed the worst. In addition, we found that second-order optimizers (SOOs) generally exhibited significantly lower variance at specific batch sizes, suggesting they may require less hyperparameter tuning, leading to a reduced overall time to solution for model training.

Keywords

hyperparameter optimization neural network training deep neural network

Cite

@article{arxiv.2307.11684,
  title  = {Minibatching Offers Improved Generalization Performance for Second Order Optimizers},
  author = {Eric Silk and Swarnita Chakraborty and Nairanjana Dasgupta and Anand D. Sarwate and Andrew Lumsdaine and Tony Chiang},
  journal= {arXiv preprint arXiv:2307.11684},
  year   = {2023}
}

Comments

14 pages, 6 figures, 5 tables

Minibatching Offers Improved Generalization Performance for Second Order Optimizers

Abstract

Keywords

Cite

Comments

Related papers