Related papers: A Function-Centric Perspective on Flat and Sharp M…

A Modern Look at the Relationship between Sharpness and Generalization

Sharpness of minima is a promising quantity that can correlate with generalization in deep networks and, when optimized during training, can improve generalization. However, standard sharpness is not invariant under reparametrizations of…

Machine Learning · Computer Science 2023-06-08 Maksym Andriushchenko , Francesco Croce , Maximilian Müller , Matthias Hein , Nicolas Flammarion

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the…

Sound · Computer Science 2024-01-17 Manuel Milling , Andreas Triantafyllopoulos , Iosif Tsangko , Simon David Noel Rampp , Björn Wolfgang Schuller

Sharp Minima Can Generalize For Deep Nets

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of…

Machine Learning · Computer Science 2017-11-15 Laurent Dinh , Razvan Pascanu , Samy Bengio , Yoshua Bengio

Are Flat Minima an Illusion?

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can…

Machine Learning · Computer Science 2026-05-08 Michael Timothy Bennett

Flatness After All?

Recent literature generalization in deep learning has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized neural networks. A key observation is that…

Machine Learning · Computer Science 2025-10-01 Neta Shoham , Liron Mor-Yosef , Haim Avron

Improving Generalization in Federated Learning by Seeking Flat Minima

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and…

Machine Learning · Computer Science 2022-07-22 Debora Caldarola , Barbara Caputo , Marco Ciccone

When Do Flat Minima Optimizers Work?

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have…

Machine Learning · Computer Science 2023-01-30 Jean Kaddour , Linqing Liu , Ricardo Silva , Matt J. Kusner

On the Trade-off between Flatness and Optimization in Distributed Learning

This paper proposes a theoretical framework to evaluate and compare the performance of stochastic gradient algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have…

Machine Learning · Computer Science 2025-07-03 Ying Cao , Zhaoxian Wu , Kun Yuan , Ali H. Sayed

Understanding Flatness in Generative Models: Its Role and Benefits

Flat minima, known to enhance generalization and robustness in supervised learning, remain largely unexplored in generative models. In this work, we systematically investigate the role of loss surface flatness in generative models, both…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Taehwan Lee , Kyeongkook Seo , Jaejun Yoo , Sung Whan Yoon

Bilateral Sharpness-Aware Minimization for Flatter Minima

Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS). Despite the practical success, we empirically found that the MAxS behind SAM's generalization enhancements face the "Flatness Indicator Problem"…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Jiaxin Deng , Junbiao Pang , Baochang Zhang , Qingming Huang

Flat Minima and Generalization: Insights from Stochastic Convex Optimization

Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, which have…

Machine Learning · Computer Science 2026-05-26 Matan Schliserman , Shira Vansover-Hager , Tomer Koren

GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty…

Machine Learning · Computer Science 2022-10-14 Zhiyuan Zhang , Ruixuan Luo , Qi Su , Xu Sun

Sharp Minima Can Generalize: A Loss Landscape Perspective On Data

The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization.…

Machine Learning · Computer Science 2025-11-10 Raymond Fan , Bryce Sandlund , Lin Myat Ko

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a…

Machine Learning · Computer Science 2023-07-25 Kaiyue Wen , Zhiyuan Li , Tengyu Ma

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

The notion of flat minima has played a key role in the generalization studies of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the…

Machine Learning · Statistics 2019-01-29 Yusuke Tsuzuku , Issei Sato , Masashi Sugiyama

Improving Generalization of Deep Neural Networks by Optimum Shifting

Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method…

Machine Learning · Computer Science 2024-05-24 Yuyan Zhou , Ye Li , Lei Feng , Sheng-Jun Huang

Sharpness-Aware Graph Collaborative Filtering

Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs…

Machine Learning · Computer Science 2023-07-19 Huiyuan Chen , Chin-Chia Michael Yeh , Yujie Fan , Yan Zheng , Junpeng Wang , Vivian Lai , Mahashweta Das , Hao Yang

Does SGD Seek Flatness or Sharpness? An Exactly Solvable Model

A large body of theory and empirical work hypothesizes a connection between the flatness of a neural network's loss landscape during training and its performance. However, there have been conceptually opposite pieces of evidence regarding…

Machine Learning · Computer Science 2026-02-06 Yizhou Xu , Pierfrancesco Beneventano , Isaac Chuang , Liu Ziyin

Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond

Little research explores the correlation between the expressive ability and generalization ability of the low-rank adaptation (LoRA). Sharpness-Aware Minimization (SAM) improves model generalization for both Convolutional Neural Networks…

Computation and Language · Computer Science 2025-12-16 Jiaxin Deng , Qingcheng Zhu , Junbiao Pang , Linlin Yang , Zhongqian Fu , Baochang Zhang

Towards Understanding Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM…

Machine Learning · Computer Science 2022-06-14 Maksym Andriushchenko , Nicolas Flammarion