Related papers: When Do Flat Minima Optimizers Work?

GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty…

Machine Learning · Computer Science 2022-10-14 Zhiyuan Zhang , Ruixuan Luo , Qi Su , Xu Sun

Improving Generalization in Federated Learning by Seeking Flat Minima

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and…

Machine Learning · Computer Science 2022-07-22 Debora Caldarola , Barbara Caputo , Marco Ciccone

Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning

Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to…

Machine Learning · Computer Science 2024-06-03 Jacob Mitchell Springer , Vaishnavh Nagarajan , Aditi Raghunathan

Towards Understanding Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM…

Machine Learning · Computer Science 2022-06-14 Maksym Andriushchenko , Nicolas Flammarion

Bilateral Sharpness-Aware Minimization for Flatter Minima

Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS). Despite the practical success, we empirically found that the MAxS behind SAM's generalization enhancements face the "Flatness Indicator Problem"…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Jiaxin Deng , Junbiao Pang , Baochang Zhang , Qingming Huang

On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees

Sharpness-Aware Minimization (SAM) is a recent optimization framework aiming to improve the deep neural network generalization, through obtaining flatter (i.e. less sharp) solutions. As SAM has been numerically successful, recent papers…

Machine Learning · Statistics 2023-05-22 Kayhan Behdin , Rahul Mazumder

Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function.…

Machine Learning · Computer Science 2022-12-09 Kayhan Behdin , Qingquan Song , Aman Gupta , David Durfee , Ayan Acharya , Sathiya Keerthi , Rahul Mazumder

Flat Minima and Generalization: Insights from Stochastic Convex Optimization

Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, which have…

Machine Learning · Computer Science 2026-05-26 Matan Schliserman , Shira Vansover-Hager , Tomer Koren

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training

Sharpness-Aware Minimization (SAM) has substantially improved the generalization of neural networks under various settings. Despite the success, its effectiveness remains poorly understood. In this work, we discover an intriguing phenomenon…

Machine Learning · Computer Science 2025-02-21 Zhanpeng Zhou , Mingze Wang , Yuchen Mao , Bingrui Li , Junchi Yan

Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term

Deep Neural Networks (DNNs) generalization is known to be closely related to the flatness of minima, leading to the development of Sharpness-Aware Minimization (SAM) for seeking flatter minima and better generalization. In this paper, we…

Machine Learning · Computer Science 2024-12-06 Yun Yue , Jiadi Jiang , Zhiling Ye , Ning Gao , Yongchao Liu , Ke Zhang

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters…

Machine Learning · Computer Science 2023-11-20 Maximilian Mueller , Tiffany Vlaar , David Rolnick , Matthias Hein

The Crucial Role of Normalization in Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in…

Machine Learning · Computer Science 2023-10-24 Yan Dai , Kwangjun Ahn , Suvrit Sra

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error…

Artificial Intelligence · Computer Science 2022-05-31 Jiawei Du , Hanshu Yan , Jiashi Feng , Joey Tianyi Zhou , Liangli Zhen , Rick Siow Mong Goh , Vincent Y. F. Tan

Rethinking Sharpness-Aware Minimization as Variational Inference

Sharpness-aware minimization (SAM) aims to improve the generalisation of gradient-based learning by seeking out flat minima. In this work, we establish connections between SAM and Mean-Field Variational Inference (MFVI) of neural network…

Machine Learning · Statistics 2022-10-20 Szilvia Ujváry , Zsigmond Telek , Anna Kerekes , Anna Mészáros , Ferenc Huszár

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient…

Machine Learning · Statistics 2023-10-03 Kayhan Behdin , Qingquan Song , Aman Gupta , Sathiya Keerthi , Ayan Acharya , Borja Ocejo , Gregory Dexter , Rajiv Khanna , David Durfee , Rahul Mazumder

Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?

Effective clinical deployment of deep learning models in healthcare demands high generalization performance to ensure accurate diagnosis and treatment planning. In recent years, significant research has focused on improving the…

Image and Video Processing · Electrical Eng. & Systems 2025-10-22 Mohamed Hassan , Aleksandar Vakanski , Min Xian

An Adaptive Policy to Employ Sharpness-Aware Minimization

Sharpness-aware minimization (SAM), which searches for flat minima by min-max optimization, has been shown to be useful in improving model generalization. However, since each SAM update requires computing two gradients, its computational…

Machine Learning · Computer Science 2023-05-01 Weisen Jiang , Hansi Yang , Yu Zhang , James Kwok

Tilted Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss…

Machine Learning · Computer Science 2025-06-10 Tian Li , Tianyi Zhou , Jeffrey A. Bilmes

Sharpness-Aware Graph Collaborative Filtering

Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs…

Machine Learning · Computer Science 2023-07-19 Huiyuan Chen , Chin-Chia Michael Yeh , Yujie Fan , Yan Zheng , Junpeng Wang , Vivian Lai , Mahashweta Das , Hao Yang

A Function-Centric Perspective on Flat and Sharp Minima

Flat minima are strongly associated with improved generalisation in deep neural networks. However, this connection has proven nuanced in recent studies, with both theoretical counterexamples and empirical exceptions emerging in the…

Machine Learning · Computer Science 2026-04-16 Israel Mason-Williams , Gabryel Mason-Williams , Helen Yannakoudakis