English
Related papers

Related papers: Non-Convex Optimization with Spectral Radius Regul…

200 papers

How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model…

Machine Learning · Computer Science 2022-06-28 Yang Zhao , Hao Zhang , Xiuyuan Hu

Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method…

Machine Learning · Computer Science 2024-05-24 Yuyan Zhou , Ye Li , Lei Feng , Sheng-Jun Huang

We investigate the generalizability of deep learning based on the sensitivity to input perturbation. We hypothesize that the high sensitivity to the perturbation of data degrades the performance on it. To reduce the sensitivity to…

Machine Learning · Statistics 2017-06-01 Yuichi Yoshida , Takeru Miyato

Proper regularization is critical for speeding up training, improving generalization performance, and learning compact models that are cost efficient. We propose and analyze regularized gradient descent algorithms for learning shallow…

Machine Learning · Computer Science 2018-06-08 Samet Oymak

The training of over-parameterized neural networks has received much study in recent literature. An important consideration is the regularization of over-parameterized networks due to their highly nonconvex and nonlinear geometry. In this…

Machine Learning · Computer Science 2024-09-24 Hongyang R. Zhang , Dongyue Li , Haotian Ju

Our work focuses on stochastic gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer. Research on this class of problem is quite limited, and until recently no non-asymptotic convergence…

Optimization and Control · Mathematics 2019-05-15 Michael R. Metel , Akiko Takeda

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and…

Machine Learning · Computer Science 2022-07-22 Debora Caldarola , Barbara Caputo , Marco Ciccone

In this paper, we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. We explain its benefits in finding…

Machine Learning · Computer Science 2023-02-23 Yucong Liu , Shixing Yu , Tong Lin

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

We introduce a novel stochastic regularization technique for deep neural networks, which decomposes a layer into multiple branches with different parameters and merges stochastically sampled combinations of the outputs from the branches…

Machine Learning · Computer Science 2019-10-04 Wonpyo Park , Paul Hongsuck Seo , Bohyung Han , Minsu Cho

Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the…

Machine Learning · Computer Science 2023-02-20 Lijun Ding , Dmitriy Drusvyatskiy , Maryam Fazel , Zaid Harchaoui

Deep Neural Networks have achieved remarkable success relying on the developing high computation capability of GPUs and large-scale datasets with increasing network depth and width in image recognition, object detection and many other…

Machine Learning · Computer Science 2020-01-08 E Zhenqian , Gao Weiguo

It is important to understand how the popular regularization method dropout helps the neural network training find a good generalization solution. In this work, we show that the training with dropout finds the neural network with a flatter…

Machine Learning · Computer Science 2022-05-24 Zhongwang Zhang , Hanxu Zhou , Zhi-Qin John Xu

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have…

Machine Learning · Computer Science 2023-01-30 Jean Kaddour , Linqing Liu , Ricardo Silva , Matt J. Kusner

Recent literature generalization in deep learning has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized neural networks. A key observation is that…

Machine Learning · Computer Science 2025-10-01 Neta Shoham , Liron Mor-Yosef , Haim Avron

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global…

Machine Learning · Statistics 2020-04-28 Colin Wei , Jason D. Lee , Qiang Liu , Tengyu Ma

Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more…

Machine Learning · Computer Science 2023-07-06 Linara Adilova , Amr Abourayya , Jianning Li , Amin Dada , Henning Petzka , Jan Egger , Jens Kleesiek , Michael Kamp

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the…

Machine Learning · Computer Science 2023-06-26 Khashayar Gatmiry , Zhiyuan Li , Ching-Yao Chuang , Sashank Reddi , Tengyu Ma , Stefanie Jegelka

We analyze the training dynamics for deep linear networks using a new metric - layer imbalance - which defines the flatness of a solution. We demonstrate that different regularization methods, such as weight decay or noise data…

Machine Learning · Computer Science 2020-07-21 Boris Ginsburg

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of…

Machine Learning · Computer Science 2017-11-15 Laurent Dinh , Razvan Pascanu , Samy Bengio , Yoshua Bengio
‹ Prev 1 2 3 10 Next ›