English
Related papers

Related papers: Per-Example Gradient Regularization Improves Learn…

200 papers

Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…

Machine Learning · Computer Science 2022-07-20 David G. T. Barrett , Benoit Dherin

Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data. Such over-fitting ability hinders generalization when mislabeled training examples are present. On the other…

Machine Learning · Computer Science 2020-10-06 Wei Hu , Zhiyuan Li , Dingli Yu

Regularizing the gradient norm of the output of a neural network with respect to its inputs is a powerful technique, rediscovered several times. This paper presents evidence that gradient regularization can consistently improve…

Machine Learning · Computer Science 2018-05-28 Dániel Varga , Adrián Csiszárik , Zsolt Zombori

We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations. Our regularizer can be derived as a controlled approximation from first principles,…

Machine Learning · Statistics 2018-05-23 Kevin Roth , Aurelien Lucchi , Sebastian Nowozin , Thomas Hofmann

The data consistency for the physical forward model is crucial in inverse problems, especially in MR imaging reconstruction. The standard way is to unroll an iterative algorithm into a neural network with a forward model embedded. The…

Image and Video Processing · Electrical Eng. & Systems 2023-06-28 Guanxiong Luo , Mengmeng Kuang , Peng Cao

Regularization is essential for avoiding over-fitting to training data in network optimization, leading to better generalization of the trained networks. The label noise provides a strong implicit regularization by replacing the target…

Machine Learning · Computer Science 2022-05-04 Kensuke Nakamura , Bong-Soo Sohn , Kyoung-Jae Won , Byung-Woo Hong

Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the…

Machine Learning · Computer Science 2023-02-06 Ryo Karakida , Tomoumi Takase , Tomohiro Hayase , Kazuki Osawa

The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and the covariance structure of gradient noise are critical for…

Machine Learning · Computer Science 2020-06-22 Jingfeng Wu , Wenqing Hu , Haoyi Xiong , Jun Huan , Vladimir Braverman , Zhanxing Zhu

How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model…

Machine Learning · Computer Science 2022-06-28 Yang Zhao , Hao Zhang , Xiuyuan Hu

Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties. In this paper, we investigate the effects of injecting noise before computing a gradient step. We demonstrate that small…

Machine Learning · Computer Science 2023-01-24 Antonio Orvieto , Anant Raj , Hans Kersting , Francis Bach

This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small…

Machine Learning · Computer Science 2023-10-24 Shin'ya Yamaguchi , Daiki Chijiwa , Sekitoshi Kanai , Atsutoshi Kumagai , Hisashi Kashima

The capacity of deep learning models is often large enough to both learn the underlying statistical signal and overfit to noise in the training set. This noise memorization can be harmful especially for data with a low signal-to-noise ratio…

Machine Learning · Computer Science 2025-10-21 Wei Huang , Andi Han , Yujin Song , Yilan Chen , Denny Wu , Difan Zou , Taiji Suzuki

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this…

Machine Learning · Computer Science 2025-12-19 Maria Matveev , Vit Fojtik , Hung-Hsu Chou , Gitta Kutyniok , Johannes Maly

Gradient regularization (GR), which aims to penalize the gradient norm atop the loss function, has shown promising results in training modern over-parameterized deep neural networks. However, can we trust this powerful technique? This paper…

Machine Learning · Computer Science 2024-06-17 Yang Zhao , Hao Zhang , Xiuyuan Hu

We analyze the training dynamics for deep linear networks using a new metric - layer imbalance - which defines the flatness of a solution. We demonstrate that different regularization methods, such as weight decay or noise data…

Machine Learning · Computer Science 2020-07-21 Boris Ginsburg

Most complex machine learning and modelling techniques are prone to over-fitting and may subsequently generalise poorly to future data. Artificial neural networks are no different in this regard and, despite having a level of implicit…

Machine Learning · Statistics 2022-05-26 Vincent Szolnoky , Viktor Andersson , Balazs Kulcsar , Rebecka Jörnsten

Gradient regularization (GR) has been shown to improve the generalizability of trained models. While Natural Gradient Descent has been shown to accelerate optimization in the initial phase of training, little attention has been paid to how…

Machine Learning · Computer Science 2026-03-27 Satya Prakash Dash , Hossein Abdi , Wei Pan , Samuel Kaski , Mingfei Sun

Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is…

Machine Learning · Computer Science 2017-11-10 Hyeonwoo Noh , Tackgeun You , Jonghwan Mun , Bohyung Han

The performance of deep neural networks is strongly influenced by the training dataset setup. In particular, when attributes having a strong correlation with the target attribute are present, the trained model can provide unintended…

Machine Learning · Computer Science 2023-02-14 Sumyeong Ahn , Seongyoon Kim , Se-young Yun

Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration…

Machine Learning · Computer Science 2022-07-07 Ryuichi Ito , Seng Pei Liew , Tsubasa Takahashi , Yuya Sasaki , Makoto Onizuka
‹ Prev 1 2 3 10 Next ›