Related papers: Relative Flatness and Generalization

A Reparameterization-Invariant Flatness Measure for Deep Neural Networks

The performance of deep neural networks is often attributed to their automated, task-related feature construction. It remains an open question, though, why this leads to solutions with good generalization, even in cases where the number of…

Machine Learning · Computer Science 2019-12-03 Henning Petzka , Linara Adilova , Michael Kamp , Cristian Sminchisescu

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a…

Machine Learning · Computer Science 2023-07-25 Kaiyue Wen , Zhiyuan Li , Tengyu Ma

Why flatness does and does not correlate with generalization for deep neural networks

The intuition that local flatness of the loss landscape is correlated with better generalization for deep neural networks (DNNs) has been explored for decades, spawning many different flatness measures. Recently, this link with…

Machine Learning · Computer Science 2021-06-22 Shuofeng Zhang , Isaac Reid , Guillermo Valle Pérez , Ard Louis

Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking

Neural collapse, i.e., the emergence of highly symmetric, class-wise clustered representations, is frequently observed in deep networks and is often assumed to reflect or enable generalization. In parallel, flatness of the loss landscape…

Machine Learning · Computer Science 2026-02-05 Ting Han , Linara Adilova , Henning Petzka , Jens Kleesiek , Michael Kamp

FAM: Relative Flatness Aware Minimization

Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more…

Machine Learning · Computer Science 2023-07-06 Linara Adilova , Amr Abourayya , Jianning Li , Amin Dada , Henning Petzka , Jan Egger , Jens Kleesiek , Michael Kamp

Does Flatness imply Generalization for Logistic Loss in Univariate Two-Layer ReLU Network?

We consider the problem of generalization of arbitrarily overparameterized two-layer ReLU Neural Networks with univariate input. Recent work showed that under square loss, flat solutions (motivated by flat / stable minima and Edge of…

Machine Learning · Computer Science 2025-12-02 Dan Qiao , Yu-Xiang Wang

A Modern Look at the Relationship between Sharpness and Generalization

Sharpness of minima is a promising quantity that can correlate with generalization in deep networks and, when optimized during training, can improve generalization. However, standard sharpness is not invariant under reparametrizations of…

Machine Learning · Computer Science 2023-06-08 Maksym Andriushchenko , Francesco Croce , Maximilian Müller , Matthias Hein , Nicolas Flammarion

Flatness After All?

Recent literature generalization in deep learning has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized neural networks. A key observation is that…

Machine Learning · Computer Science 2025-10-01 Neta Shoham , Liron Mor-Yosef , Haim Avron

Sharp Minima Can Generalize For Deep Nets

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of…

Machine Learning · Computer Science 2017-11-15 Laurent Dinh , Razvan Pascanu , Samy Bengio , Yoshua Bengio

Relating Adversarially Robust Generalization to Flat Minima

Adversarial training (AT) has become the de-facto standard to obtain models robust against adversarial examples. However, AT exhibits severe robust overfitting: cross-entropy loss on adversarial examples, so-called robust loss, decreases…

Machine Learning · Computer Science 2021-10-07 David Stutz , Matthias Hein , Bernt Schiele

Towards Understanding Generalization in Gradient-Based Meta-Learning

In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions,…

Machine Learning · Computer Science 2019-07-18 Simon Guiroy , Vikas Verma , Christopher Pal

The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

Model reparametrization, which follows the change-of-variable rule of calculus, is a popular way to improve the training of neural nets. But it can also be problematic since it can induce inconsistencies in, e.g., Hessian-based flatness…

Machine Learning · Computer Science 2023-10-24 Agustinus Kristiadi , Felix Dangel , Philipp Hennig

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We…

Machine Learning · Computer Science 2025-05-09 Marvin F. da Silva , Felix Dangel , Sageev Oore

When Flatness Does (Not) Guarantee Adversarial Robustness

Despite their empirical success, neural networks remain vulnerable to small, adversarial perturbations. A longstanding hypothesis suggests that flat minima, regions of low curvature in the loss landscape, offer increased robustness. While…

Machine Learning · Computer Science 2025-10-17 Nils Philipp Walter , Linara Adilova , Jilles Vreeken , Michael Kamp

Modeling Generalization in Machine Learning: A Methodological and Computational Study

As machine learning becomes more and more available to the general public, theoretical questions are turning into pressing practical issues. Possibly, one of the most relevant concerns is the assessment of our confidence in trusting machine…

Machine Learning · Computer Science 2020-06-30 Pietro Barbiero , Giovanni Squillero , Alberto Tonda

Are Flat Minima an Illusion?

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can…

Machine Learning · Computer Science 2026-05-08 Michael Timothy Bennett

Robustness to Pruning Predicts Generalization in Deep Neural Networks

Existing generalization measures that aim to capture a model's simplicity based on parameter counts or norms fail to explain generalization in overparameterized deep neural networks. In this paper, we introduce a new, theoretically…

Machine Learning · Computer Science 2021-03-11 Lorenz Kuhn , Clare Lyle , Aidan N. Gomez , Jonas Rothfuss , Yarin Gal

Flatness is a False Friend

Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation. In this paper we demonstrate that for feed forward neural networks under the cross entropy…

Machine Learning · Statistics 2020-06-17 Diego Granziol

Exploring Generalization in Deep Learning

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization,…

Machine Learning · Computer Science 2017-07-07 Behnam Neyshabur , Srinadh Bhojanapalli , David McAllester , Nathan Srebro

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

The notion of flat minima has played a key role in the generalization studies of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the…

Machine Learning · Statistics 2019-01-29 Yusuke Tsuzuku , Issei Sato , Masashi Sugiyama