English
Related papers

Related papers: A Function-Centric Perspective on Flat and Sharp M…

200 papers

Sharpness of minima is a promising quantity that can correlate with generalization in deep networks and, when optimized during training, can improve generalization. However, standard sharpness is not invariant under reparametrizations of…

Machine Learning · Computer Science 2023-06-08 Maksym Andriushchenko , Francesco Croce , Maximilian Müller , Matthias Hein , Nicolas Flammarion

The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the…

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of…

Machine Learning · Computer Science 2017-11-15 Laurent Dinh , Razvan Pascanu , Samy Bengio , Yoshua Bengio

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can…

Machine Learning · Computer Science 2026-05-08 Michael Timothy Bennett

Recent literature generalization in deep learning has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized neural networks. A key observation is that…

Machine Learning · Computer Science 2025-10-01 Neta Shoham , Liron Mor-Yosef , Haim Avron

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and…

Machine Learning · Computer Science 2022-07-22 Debora Caldarola , Barbara Caputo , Marco Ciccone

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have…

Machine Learning · Computer Science 2023-01-30 Jean Kaddour , Linqing Liu , Ricardo Silva , Matt J. Kusner

This paper proposes a theoretical framework to evaluate and compare the performance of stochastic gradient algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have…

Machine Learning · Computer Science 2025-07-03 Ying Cao , Zhaoxian Wu , Kun Yuan , Ali H. Sayed

Flat minima, known to enhance generalization and robustness in supervised learning, remain largely unexplored in generative models. In this work, we systematically investigate the role of loss surface flatness in generative models, both…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Taehwan Lee , Kyeongkook Seo , Jaejun Yoo , Sung Whan Yoon

Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS). Despite the practical success, we empirically found that the MAxS behind SAM's generalization enhancements face the "Flatness Indicator Problem"…

Computer Vision and Pattern Recognition · Computer Science 2024-09-23 Jiaxin Deng , Junbiao Pang , Baochang Zhang , Qingming Huang

Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, which have…

Machine Learning · Computer Science 2026-05-26 Matan Schliserman , Shira Vansover-Hager , Tomer Koren

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty…

Machine Learning · Computer Science 2022-10-14 Zhiyuan Zhang , Ruixuan Luo , Qi Su , Xu Sun

The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization.…

Machine Learning · Computer Science 2025-11-10 Raymond Fan , Bryce Sandlund , Lin Myat Ko

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a…

Machine Learning · Computer Science 2023-07-25 Kaiyue Wen , Zhiyuan Li , Tengyu Ma

The notion of flat minima has played a key role in the generalization studies of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the…

Machine Learning · Statistics 2019-01-29 Yusuke Tsuzuku , Issei Sato , Masashi Sugiyama

Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method…

Machine Learning · Computer Science 2024-05-24 Yuyan Zhou , Ye Li , Lei Feng , Sheng-Jun Huang

Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs…

Machine Learning · Computer Science 2023-07-19 Huiyuan Chen , Chin-Chia Michael Yeh , Yujie Fan , Yan Zheng , Junpeng Wang , Vivian Lai , Mahashweta Das , Hao Yang

A large body of theory and empirical work hypothesizes a connection between the flatness of a neural network's loss landscape during training and its performance. However, there have been conceptually opposite pieces of evidence regarding…

Machine Learning · Computer Science 2026-02-06 Yizhou Xu , Pierfrancesco Beneventano , Isaac Chuang , Liu Ziyin

Little research explores the correlation between the expressive ability and generalization ability of the low-rank adaptation (LoRA). Sharpness-Aware Minimization (SAM) improves model generalization for both Convolutional Neural Networks…

Computation and Language · Computer Science 2025-12-16 Jiaxin Deng , Qingcheng Zhu , Junbiao Pang , Linlin Yang , Zhongqian Fu , Baochang Zhang

Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM…

Machine Learning · Computer Science 2022-06-14 Maksym Andriushchenko , Nicolas Flammarion
‹ Prev 1 2 3 10 Next ›