Related papers: Per-Example Gradient Regularization Improves Learn…

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…

Machine Learning · Computer Science 2022-07-20 David G. T. Barrett , Benoit Dherin

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data. Such over-fitting ability hinders generalization when mislabeled training examples are present. On the other…

Machine Learning · Computer Science 2020-10-06 Wei Hu , Zhiyuan Li , Dingli Yu

Gradient Regularization Improves Accuracy of Discriminative Models

Regularizing the gradient norm of the output of a neural network with respect to its inputs is a powerful technique, rediscovered several times. This paper presents evidence that gradient regularization can consistently improve…

Machine Learning · Computer Science 2018-05-28 Dániel Varga , Adrián Csiszárik , Zsolt Zombori

Adversarially Robust Training through Structured Gradient Regularization

We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations. Our regularizer can be derived as a controlled approximation from first principles,…

Machine Learning · Statistics 2018-05-23 Kevin Roth , Aurelien Lucchi , Sebastian Nowozin , Thomas Hofmann

Generalized Deep Learning-based Proximal Gradient Descent for MR Reconstruction

The data consistency for the physical forward model is crucial in inverse problems, especially in MR imaging reconstruction. The standard way is to unroll an iterative algorithm into a neural network with a forward model embedded. The…

Image and Video Processing · Electrical Eng. & Systems 2023-06-28 Guanxiong Luo , Mengmeng Kuang , Peng Cao

Regularization in network optimization via trimmed stochastic gradient descent with noisy label

Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. The label noise provides a strong implicit regularization by replacing the target…

Machine Learning · Computer Science 2022-05-04 Kensuke Nakamura , Bong-Soo Sohn , Kyoung-Jae Won , Byung-Woo Hong

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the…

Machine Learning · Computer Science 2023-02-06 Ryo Karakida , Tomoumi Takase , Tomohiro Hayase , Kazuki Osawa

On the Noisy Gradient Descent that Generalizes as SGD

The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and the covariance structure of gradient noise are critical for…

Machine Learning · Computer Science 2020-06-22 Jingfeng Wu , Wenqing Hu , Haoyi Xiong , Jun Huan , Vladimir Braverman , Zhanxing Zhu

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model…

Machine Learning · Computer Science 2022-06-28 Yang Zhao , Hao Zhang , Xiuyuan Hu

Explicit Regularization in Overparametrized Models via Noise Injection

Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties. In this paper, we investigate the effects of injecting noise before computing a gradient step. We demonstrate that small…

Machine Learning · Computer Science 2023-01-24 Antonio Orvieto , Anant Raj , Hans Kersting , Francis Bach

Regularizing Neural Networks with Meta-Learning Generative Models

This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small…

Machine Learning · Computer Science 2023-10-24 Shin'ya Yamaguchi , Daiki Chijiwa , Sekitoshi Kanai , Atsutoshi Kumagai , Hisashi Kashima

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

The capacity of deep learning models is often large enough to both learn the underlying statistical signal and overfit to noise in the training set. This noise memorization can be harmful especially for data with a low signal-to-noise ratio…

Machine Learning · Computer Science 2025-10-21 Wei Huang , Andi Han , Yujin Song , Yilan Chen , Denny Wu , Difan Zou , Taiji Suzuki

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

When Will Gradient Regularization Be Harmful?

Gradient regularization (GR), which aims to penalize the gradient norm atop the loss function, has shown promising results in training modern over-parameterized deep neural networks. However, can we trust this powerful technique? This paper…

Machine Learning · Computer Science 2024-06-17 Yang Zhao , Hao Zhang , Xiuyuan Hu

On regularization of gradient descent, layer imbalance and flat minima

We analyze the training dynamics for deep linear networks using a new metric - layer imbalance - which defines the flatness of a solution. We demonstrate that different regularization methods, such as weight decay or noise data…

Machine Learning · Computer Science 2020-07-21 Boris Ginsburg

On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity

Most complex machine learning and modelling techniques are prone to over-fitting and may subsequently generalise poorly to future data. Artificial neural networks are no different in this regard and, despite having a level of implicit…

Machine Learning · Statistics 2022-05-26 Vincent Szolnoky , Viktor Andersson , Balazs Kulcsar , Rebecka Jörnsten

Gradient Regularized Natural Gradients

Gradient regularization (GR) has been shown to improve the generalizability of trained models. While Natural Gradient Descent has been shown to accelerate optimization in the initial phase of training, little attention has been paid to how…

Machine Learning · Computer Science 2026-03-27 Satya Prakash Dash , Hossein Abdi , Wei Pan , Samuel Kaski , Mingfei Sun

Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is…

Machine Learning · Computer Science 2017-11-10 Hyeonwoo Noh , Tackgeun You , Jonghwan Mun , Bohyung Han

Mitigating Dataset Bias by Using Per-sample Gradient

The performance of deep neural networks is strongly influenced by the training dataset setup. In particular, when attributes having a strong correlation with the target attribute are present, the trained model can provide unintended…

Machine Learning · Computer Science 2023-02-14 Sumyeong Ahn , Seongyoon Kim , Se-young Yun

Scaling Private Deep Learning with Low-Rank and Sparse Gradients

Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration…

Machine Learning · Computer Science 2022-07-07 Ryuichi Ito , Seng Pei Liew , Tsubasa Takahashi , Yuya Sasaki , Makoto Onizuka