English
Related papers

Related papers: Sharp Minima Can Generalize For Deep Nets

200 papers

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a…

Machine Learning · Computer Science 2023-07-25 Kaiyue Wen , Zhiyuan Li , Tengyu Ma

The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization.…

Machine Learning · Computer Science 2025-11-10 Raymond Fan , Bryce Sandlund , Lin Myat Ko

Recent literature generalization in deep learning has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized neural networks. A key observation is that…

Machine Learning · Computer Science 2025-10-01 Neta Shoham , Liron Mor-Yosef , Haim Avron

Flat minima are strongly associated with improved generalisation in deep neural networks. However, this connection has proven nuanced in recent studies, with both theoretical counterexamples and empirical exceptions emerging in the…

Machine Learning · Computer Science 2026-04-16 Israel Mason-Williams , Gabryel Mason-Williams , Helen Yannakoudakis

The performance of deep neural networks is often attributed to their automated, task-related feature construction. It remains an open question, though, why this leads to solutions with good generalization, even in cases where the number of…

Machine Learning · Computer Science 2019-12-03 Henning Petzka , Linara Adilova , Michael Kamp , Cristian Sminchisescu

It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks…

Machine Learning · Computer Science 2017-11-29 Lei Wu , Zhanxing Zhu , Weinan E

Sharpness of minima is a promising quantity that can correlate with generalization in deep networks and, when optimized during training, can improve generalization. However, standard sharpness is not invariant under reparametrizations of…

Machine Learning · Computer Science 2023-06-08 Maksym Andriushchenko , Francesco Croce , Maximilian Müller , Matthias Hein , Nicolas Flammarion

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can…

Machine Learning · Computer Science 2026-05-08 Michael Timothy Bennett

Deep neural networks trained on a wide range of datasets demonstrate impressive transferability. Deep features appear general in that they are applicable to many datasets and tasks. Such property is in prevalent use in real-world…

Machine Learning · Computer Science 2019-09-27 Hong Liu , Mingsheng Long , Jianmin Wang , Michael I. Jordan

We take a geometrical viewpoint and present a unifying view on supervised deep learning with the Bregman divergence loss function - this entails frequent classification and prediction tasks. Motivated by simulations we suggest that there is…

Machine Learning · Computer Science 2021-07-07 Petr Taborsky , Lars Kai Hansen

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1)…

Machine Learning · Computer Science 2019-08-27 Tomaso Poggio , Andrzej Banburski , Qianli Liao

Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method…

Machine Learning · Computer Science 2024-05-24 Yuyan Zhou , Ye Li , Lei Feng , Sheng-Jun Huang

We consider the problem of generalization of arbitrarily overparameterized two-layer ReLU Neural Networks with univariate input. Recent work showed that under square loss, flat solutions (motivated by flat / stable minima and Edge of…

Machine Learning · Computer Science 2025-12-02 Dan Qiao , Yu-Xiang Wang

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and…

Machine Learning · Computer Science 2022-07-22 Debora Caldarola , Barbara Caputo , Marco Ciccone

Characterizing the remarkable generalization properties of over-parameterized neural networks remains an open problem. In this paper, we promote a shift of focus towards initialization rather than neural architecture or (stochastic)…

Machine Learning · Computer Science 2022-07-12 Sameera Ramasinghe , Lachlan MacDonald , Moshiur Farazi , Hemanth Saratchandran , Simon Lucey

The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We…

Machine Learning · Computer Science 2025-05-09 Marvin F. da Silva , Felix Dangel , Sageev Oore

By using the viewpoint of modern computational algebraic geometry, we explore properties of the optimization landscapes of the deep linear neural network models. After clarifying on the various definitions of "flat" minima, we show that the…

Machine Learning · Statistics 2018-10-19 Dhagash Mehta , Tianran Chen , Tingting Tang , Jonathan D. Hauenstein

Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more…

Machine Learning · Computer Science 2023-07-06 Linara Adilova , Amr Abourayya , Jianning Li , Amin Dada , Henning Petzka , Jan Egger , Jens Kleesiek , Michael Kamp

Despite the non-convex nature of their loss functions, deep neural networks are known to generalize well when optimized with stochastic gradient descent (SGD). Recent work conjectures that SGD with proper configuration is able to find wide…

Machine Learning · Computer Science 2019-04-09 Haowei He , Gao Huang , Yang Yuan

Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data?…

Machine Learning · Computer Science 2023-03-24 Minyoung Huh , Hossein Mobahi , Richard Zhang , Brian Cheung , Pulkit Agrawal , Phillip Isola
‹ Prev 1 2 3 10 Next ›