Related papers: When Does Re-initialization Work?

The Impact of Reinitialization on Generalization in Convolutional Neural Networks

Recent results suggest that reinitializing a subset of the parameters of a neural network during training can improve generalization, particularly for small training sets. We study the impact of different reinitialization methods in several…

Machine Learning · Computer Science 2021-09-02 Ibrahim Alabdulmohsin , Hartmut Maennel , Daniel Keysers

Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

Regularization is typically understood as improving generalization by altering the landscape of local extrema to which the model eventually converges. Deep neural networks (DNNs), however, challenge this view: We show that removing…

Machine Learning · Computer Science 2019-06-03 Aditya Golatkar , Alessandro Achille , Stefano Soatto

Regularisation in neural networks: a survey and empirical analysis of approaches

Despite huge successes on a wide range of tasks, neural networks are known to sometimes struggle to generalise to unseen data. Many approaches have been proposed over the years to promote the generalisation ability of neural networks,…

Machine Learning · Computer Science 2026-02-02 Christiaan P. Opperman , Anna S. Bosman , Katherine M. Malan

Training Deep Neural Networks Without Batch Normalization

Training neural networks is an optimization problem, and finding a decent set of parameters through gradient descent can be a difficult task. A host of techniques has been developed to aid this process before and during the training phase.…

Machine Learning · Computer Science 2020-08-19 Divya Gaur , Joachim Folz , Andreas Dengel

Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes

Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better…

Machine Learning · Computer Science 2017-03-08 Mengye Ren , Renjie Liao , Raquel Urtasun , Fabian H. Sinz , Richard S. Zemel

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data. Such over-fitting ability hinders generalization when mislabeled training examples are present. On the other…

Machine Learning · Computer Science 2020-10-06 Wei Hu , Zhiyuan Li , Dingli Yu

Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is…

Machine Learning · Computer Science 2017-11-10 Hyeonwoo Noh , Tackgeun You , Jonghwan Mun , Bohyung Han

Layer Normalization

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the…

Machine Learning · Statistics 2016-07-22 Jimmy Lei Ba , Jamie Ryan Kiros , Geoffrey E. Hinton

Regularization Matters in Policy Optimization

Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$…

Machine Learning · Computer Science 2021-11-30 Zhuang Liu , Xuanlin Li , Bingyi Kang , Trevor Darrell

Accelerating Training of Deep Neural Networks with a Standardization Loss

A significant advance in accelerating neural network training has been the development of normalization methods, permitting the training of deep models both faster and with better accuracy. These advances come with practical challenges: for…

Machine Learning · Computer Science 2019-03-05 Jasmine Collins , Johannes Balle , Jonathon Shlens

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past,…

Machine Learning · Computer Science 2020-09-29 Lei Huang , Jie Qin , Yi Zhou , Fan Zhu , Li Liu , Ling Shao

Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated…

Machine Learning · Computer Science 2023-07-12 Hui Kang , Sheng Liu , Huaxi Huang , Jun Yu , Bo Han , Dadong Wang , Tongliang Liu

Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization

Deep convolutional neural networks are known to be unstable during training at high learning rate unless normalization techniques are employed. Normalizing weights or activations allows the use of higher learning rates, resulting in faster…

Machine Learning · Computer Science 2019-12-02 Brendan Ruff , Taylor Beck , Joscha Bach

Normalization and effective learning rates in reinforcement learning

Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting…

Machine Learning · Computer Science 2024-07-03 Clare Lyle , Zeyu Zheng , Khimya Khetarpal , James Martens , Hado van Hasselt , Razvan Pascanu , Will Dabney

Pruning Before Training May Improve Generalization, Provably

It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even…

Machine Learning · Computer Science 2023-01-31 Hongru Yang , Yingbin Liang , Xiaojie Guo , Lingfei Wu , Zhangyang Wang

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice,…

Machine Learning · Statistics 2019-10-31 Devansh Arpit , Victor Campos , Yoshua Bengio

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However,…

Machine Learning · Computer Science 2020-06-17 Jiacheng Sun , Xiangyong Cao , Hanwen Liang , Weiran Huang , Zewei Chen , Zhenguo Li

Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods

Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by…

Machine Learning · Statistics 2014-10-22 Saahil Ognawala , Justin Bayer

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by…

Machine Learning · Computer Science 2021-04-21 Norman Mu , Zhewei Yao , Amir Gholami , Kurt Keutzer , Michael Mahoney

Is Feature Diversity Necessary in Neural Network Initialization?

Standard practice in training neural networks involves initializing the weights in an independent fashion. The results of recent work suggest that feature "diversity" at initialization plays an important role in training the network.…

Machine Learning · Computer Science 2020-07-06 Yaniv Blumenfeld , Dar Gilboa , Daniel Soudry