Related papers: When Will Gradient Regularization Be Harmful?

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the…

Machine Learning · Computer Science 2023-02-06 Ryo Karakida , Tomoumi Takase , Tomohiro Hayase , Kazuki Osawa

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks

Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various…

Machine Learning · Computer Science 2024-08-21 Huixiu Jiang , Ling Yang , Yu Bao , Rutong Si , Sikun Yang

Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters

Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model.…

Machine Learning · Computer Science 2016-06-20 Jelena Luketina , Mathias Berglund , Klaus Greff , Tapani Raiko

Gradient Regularized Natural Gradients

Gradient regularization (GR) has been shown to improve the generalizability of trained models. While Natural Gradient Descent has been shown to accelerate optimization in the initial phase of training, little attention has been paid to how…

Machine Learning · Computer Science 2026-03-27 Satya Prakash Dash , Hossein Abdi , Wei Pan , Samuel Kaski , Mingfei Sun

Gradient Regularization Improves Accuracy of Discriminative Models

Regularizing the gradient norm of the output of a neural network with respect to its inputs is a powerful technique, rediscovered several times. This paper presents evidence that gradient regularization can consistently improve…

Machine Learning · Computer Science 2018-05-28 Dániel Varga , Adrián Csiszárik , Zsolt Zombori

A Unified Gradient Regularization Family for Adversarial Examples

Adversarial examples are augmented data points generated by imperceptible perturbation of input samples. They have recently drawn much attention with the machine learning and data mining community. Being difficult to distinguish from real…

Machine Learning · Computer Science 2016-03-03 Chunchuan Lyu , Kaizhu Huang , Hai-Ning Liang

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model…

Machine Learning · Computer Science 2022-06-28 Yang Zhao , Hao Zhang , Xiuyuan Hu

Characterizing Model Robustness via Natural Input Gradients

Adversarially robust models are locally smooth around each data sample so that small perturbations cannot drastically change model outputs. In modern systems, such smoothness is usually obtained via Adversarial Training, which explicitly…

Machine Learning · Computer Science 2024-10-01 Adrián Rodríguez-Muñoz , Tongzhou Wang , Antonio Torralba

Improving Resistance to Adversarial Deformations by Regularizing Gradients

Improving the resistance of deep neural networks against adversarial attacks is important for deploying models to realistic applications. However, most defense methods are designed to defend against intensity perturbations and ignore…

Machine Learning · Computer Science 2020-10-07 Pengfei Xia , Bin Li

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods. Our approach modifies the objective function by adding a regularization term on the learning…

Machine Learning · Computer Science 2021-04-13 Guangzeng Xie , Hao Jin , Dachao Lin , Zhihua Zhang

Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

Gradient regularization, as described in \citet{barrett2021implicit}, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly…

Machine Learning · Statistics 2023-04-03 Xuran Meng , Yuan Cao , Difan Zou

Gradient-Coherent Strong Regularization for Deep Neural Networks

Regularization plays an important role in generalization of deep neural networks, which are often prone to overfitting with their numerous parameters. L1 and L2 regularizers are common regularization tools in machine learning with their…

Machine Learning · Computer Science 2019-10-21 Dae Hoon Park , Chiu Man Ho , Yi Chang , Huaqing Zhang

GReAT: A Graph Regularized Adversarial Training Method

This paper presents GReAT (Graph Regularized Adversarial Training), a novel regularization method designed to enhance the robust classification performance of deep learning models. Adversarial examples, characterized by subtle perturbations…

Machine Learning · Computer Science 2024-05-06 Samet Bayram , Kenneth Barner

On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity

Most complex machine learning and modelling techniques are prone to over-fitting and may subsequently generalise poorly to future data. Artificial neural networks are no different in this regard and, despite having a level of implicit…

Machine Learning · Statistics 2022-05-26 Vincent Szolnoky , Viktor Andersson , Balazs Kulcsar , Rebecka Jörnsten

SGDR: Stochastic Gradient Descent with Warm Restarts

Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient…

Machine Learning · Computer Science 2017-05-04 Ilya Loshchilov , Frank Hutter

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient…

Machine Learning · Computer Science 2022-07-20 David G. T. Barrett , Benoit Dherin

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit…

Machine Learning · Computer Science 2021-12-10 Aviral Kumar , Rishabh Agarwal , Tengyu Ma , Aaron Courville , George Tucker , Sergey Levine

Gradient Monitored Reinforcement Learning

This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning. Particularly, we focus on the enhancement of training and evaluation performance in…

Machine Learning · Computer Science 2020-05-26 Mohammed Sharafath Abdul Hameed , Gavneet Singh Chadha , Andreas Schwung , Steven X. Ding

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards

Reinforcement Learning from Human Feedback (RLHF) or Verifiable Rewards (RLVR) are two key steps in the post-training of modern Language Models (LMs). A common problem is reward hacking, where the policy may exploit inaccuracies of the…

Machine Learning · Computer Science 2026-02-23 Johannes Ackermann , Michael Noukhovitch , Takashi Ishida , Masashi Sugiyama

Gradient-based Regularization Parameter Selection for Problems with Non-smooth Penalty Functions

In high-dimensional and/or non-parametric regression problems, regularization (or penalization) is used to control model complexity and induce desired structure. Each penalty has a weight parameter that indicates how strongly the structure…

Machine Learning · Statistics 2017-03-30 Jean Feng , Noah Simon