mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Kayhan Behdin; Qingquan Song; Aman Gupta; Sathiya Keerthi; Ayan Acharya; Borja Ocejo; Gregory Dexter; Rajiv Khanna; David Durfee; Rahul Mazumder

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Machine Learning 2023-10-03 v2 Machine Learning

Authors: Kayhan Behdin , Qingquan Song , Aman Gupta , Sathiya Keerthi , Ayan Acharya , Borja Ocejo , Gregory Dexter , Rajiv Khanna , David Durfee , Rahul Mazumder

View on arXiv ↗ PDF ↗

Abstract

Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework.

Keywords

deep learning sparse signal estimation regularization in machine learning

Cite

@article{arxiv.2302.09693,
  title  = {mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization},
  author = {Kayhan Behdin and Qingquan Song and Aman Gupta and Sathiya Keerthi and Ayan Acharya and Borja Ocejo and Gregory Dexter and Rajiv Khanna and David Durfee and Rahul Mazumder},
  journal= {arXiv preprint arXiv:2302.09693},
  year   = {2023}
}

Comments

arXiv admin note: substantial text overlap with arXiv:2212.04343

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Abstract

Keywords

Cite

Comments

Related papers