Related papers: Non-Convex Optimization with Spectral Radius Regul…

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model…

Machine Learning · Computer Science 2022-06-28 Yang Zhao , Hao Zhang , Xiuyuan Hu

Improving Generalization of Deep Neural Networks by Optimum Shifting

Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method…

Machine Learning · Computer Science 2024-05-24 Yuyan Zhou , Ye Li , Lei Feng , Sheng-Jun Huang

Spectral Norm Regularization for Improving the Generalizability of Deep Learning

We investigate the generalizability of deep learning based on the sensitivity to input perturbation. We hypothesize that the high sensitivity to the perturbation of data degrades the performance on it. To reduce the sensitivity to…

Machine Learning · Statistics 2017-06-01 Yuichi Yoshida , Takeru Miyato

Learning Compact Neural Networks with Regularization

Proper regularization is critical for speeding up training, improving generalization performance, and learning compact models that are cost efficient. We propose and analyze regularized gradient descent algorithms for learning shallow…

Machine Learning · Computer Science 2018-06-08 Samet Oymak

Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

The training of over-parameterized neural networks has received much study in recent literature. An important consideration is the regularization of over-parameterized networks due to their highly nonconvex and nonlinear geometry. In this…

Machine Learning · Computer Science 2024-09-24 Hongyang R. Zhang , Dongyue Li , Haotian Ju

Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization

Our work focuses on stochastic gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer. Research on this class of problem is quite limited, and until recently no non-asymptotic convergence…

Optimization and Control · Mathematics 2019-05-15 Michael R. Metel , Akiko Takeda

Improving Generalization in Federated Learning by Seeking Flat Minima

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and…

Machine Learning · Computer Science 2022-07-22 Debora Caldarola , Barbara Caputo , Marco Ciccone

Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

In this paper, we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. We explain its benefits in finding…

Machine Learning · Computer Science 2023-02-23 Yucong Liu , Shixing Yu , Tong Lin

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

Regularizing Neural Networks via Stochastic Branch Layers

We introduce a novel stochastic regularization technique for deep neural networks, which decomposes a layer into multiple branches with different parameters and merges stochastically sampled combinations of the outputs from the branches…

Machine Learning · Computer Science 2019-10-04 Wonpyo Park , Paul Hongsuck Seo , Bohyung Han , Minsu Cho

Flat minima generalize for low-rank matrix recovery

Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the…

Machine Learning · Computer Science 2023-02-20 Lijun Ding , Dmitriy Drusvyatskiy , Maryam Fazel , Zaid Harchaoui

An Improving Framework of regularization for Network Compression

Deep Neural Networks have achieved remarkable success relying on the developing high computation capability of GPUs and large-scale datasets with increasing network depth and width in image recognition, object detection and many other…

Machine Learning · Computer Science 2020-01-08 E Zhenqian , Gao Weiguo

Dropout in Training Neural Networks: Flatness of Solution and Noise Structure

It is important to understand how the popular regularization method dropout helps the neural network training find a good generalization solution. In this work, we show that the training with dropout finds the neural network with a flatter…

Machine Learning · Computer Science 2022-05-24 Zhongwang Zhang , Hanxu Zhou , Zhi-Qin John Xu

When Do Flat Minima Optimizers Work?

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have…

Machine Learning · Computer Science 2023-01-30 Jean Kaddour , Linqing Liu , Ricardo Silva , Matt J. Kusner

Flatness After All?

Recent literature generalization in deep learning has examined the relationship between the curvature of the loss function at minima and generalization, mainly in the context of overparameterized neural networks. A key observation is that…

Machine Learning · Computer Science 2025-10-01 Neta Shoham , Liron Mor-Yosef , Haim Avron

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global…

Machine Learning · Statistics 2020-04-28 Colin Wei , Jason D. Lee , Qiang Liu , Tengyu Ma

FAM: Relative Flatness Aware Minimization

Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and was followed by more…

Machine Learning · Computer Science 2023-07-06 Linara Adilova , Amr Abourayya , Jianning Li , Amin Dada , Henning Petzka , Jan Egger , Jens Kleesiek , Michael Kamp

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the…

Machine Learning · Computer Science 2023-06-26 Khashayar Gatmiry , Zhiyuan Li , Ching-Yao Chuang , Sashank Reddi , Tengyu Ma , Stefanie Jegelka

On regularization of gradient descent, layer imbalance and flat minima

We analyze the training dynamics for deep linear networks using a new metric - layer imbalance - which defines the flatness of a solution. We demonstrate that different regularization methods, such as weight decay or noise data…

Machine Learning · Computer Science 2020-07-21 Boris Ginsburg

Sharp Minima Can Generalize For Deep Nets

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice. However, explaining why this is the case is still an open area of…

Machine Learning · Computer Science 2017-11-15 Laurent Dinh , Razvan Pascanu , Samy Bengio , Yoshua Bengio