Related papers: Fixup Initialization: Residual Learning Without No…

IDInit: A Universal and Stable Initialization Method for Neural Network Training

Deep neural networks have achieved remarkable accomplishments in practice. The success of these networks hinges on effective initialization methods, which are vital for ensuring stable and rapid convergence during training. Recently,…

Machine Learning · Computer Science 2025-03-11 Yu Pan , Chaozheng Wang , Zekai Wu , Qifan Wang , Min Zhang , Zenglin Xu

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice,…

Machine Learning · Statistics 2019-10-31 Devansh Arpit , Victor Campos , Yoshua Bengio

ExplainFix: Explainable Spatially Fixed Deep Networks

Is there an initialization for deep networks that requires no learning? ExplainFix adopts two design principles: the "fixed filters" principle that all spatial filter weights of convolutional neural networks can be fixed at initialization…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Alex Gaudio , Christos Faloutsos , Asim Smailagic , Pedro Costa , Aurelio Campilho

Normalization and effective learning rates in reinforcement learning

Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting…

Machine Learning · Computer Science 2024-07-03 Clare Lyle , Zeyu Zheng , Khimya Khetarpal , James Martens , Hado van Hasselt , Razvan Pascanu , Will Dabney

Farkas layers: don't shift the data, fix the geometry

Successfully training deep neural networks often requires either batch normalization, appropriate weight initialization, both of which come with their own challenges. We propose an alternative, geometrically motivated method for training.…

Machine Learning · Computer Science 2019-10-08 Aram-Alexandre Pooladian , Chris Finlay , Adam M Oberman

When Does Re-initialization Work?

Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This…

Machine Learning · Computer Science 2023-04-04 Sheheryar Zaidi , Tudor Berariu , Hyunjik Kim , Jörg Bornschein , Claudia Clopath , Yee Whye Teh , Razvan Pascanu

Multilevel Initialization for Layer-Parallel Deep Neural Network Training

This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal…

Machine Learning · Computer Science 2019-12-20 Eric C. Cyr , Stefanie Günther , Jacob B. Schroder

Shrinkage Initialization for Smooth Learning of Neural Networks

The successes of intelligent systems have quite relied on the artificial learning of information, which lead to the broad applications of neural learning solutions. As a common sense, the training of neural networks can be largely improved…

Machine Learning · Computer Science 2025-04-15 Miao Cheng , Feiyan Zhou , Hongwei Zou , Limin Wang

Deep Residual Networks and Weight Initialization

Residual Network (ResNet) is the state-of-the-art architecture that realizes successful training of really deep neural network. It is also known that good weight initialization of neural network avoids problem of vanishing/exploding…

Machine Learning · Computer Science 2017-10-16 Masato Taki

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting…

Machine Learning · Statistics 2020-06-15 Hadi Daneshmand , Jonas Kohler , Francis Bach , Thomas Hofmann , Aurelien Lucchi

Noise Injection Node Regularization for Robust Learning

We introduce Noise Injection Node Regularization (NINR), a method of injecting structured noise into Deep Neural Networks (DNN) during the training stage, resulting in an emergent regularizing effect. We present theoretical and empirical…

Machine Learning · Computer Science 2023-05-03 Noam Levi , Itay M. Bloch , Marat Freytsis , Tomer Volansky

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks. We show that this key benefit…

Machine Learning · Computer Science 2020-12-10 Soham De , Samuel L. Smith

PatchUp: A Feature-Space Block-Level Regularization Technique for Convolutional Neural Networks

Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training…

Machine Learning · Computer Science 2023-01-10 Mojtaba Faramarzi , Mohammad Amini , Akilesh Badrinaaraayanan , Vikas Verma , Sarath Chandar

Is Feature Diversity Necessary in Neural Network Initialization?

Standard practice in training neural networks involves initializing the weights in an independent fashion. The results of recent work suggest that feature "diversity" at initialization plays an important role in training the network.…

Machine Learning · Computer Science 2020-07-06 Yaniv Blumenfeld , Dar Gilboa , Daniel Soudry

Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients

Normalization layers are widely used in deep neural networks to stabilize training. In this paper, we consider the training of convolutional neural networks with gradient descent on a single training example. This optimization problem…

Machine Learning · Computer Science 2019-07-24 Zhenwei Dai , Reinhard Heckel

Training Thinner and Deeper Neural Networks: Jumpstart Regularization

Neural networks are more expressive when they have multiple layers. In turn, conventional training methods are only successful if the depth does not lead to numerical issues such as exploding or vanishing gradients, which occur less…

Machine Learning · Computer Science 2022-06-07 Carles Riera , Camilo Rey , Thiago Serra , Eloi Puertas , Oriol Pujol

Revisiting Batch Norm Initialization

Batch normalization (BN) is comprised of a normalization component followed by an affine transformation and has become essential for training deep neural networks. Standard initialization of each BN in a network sets the affine…

Computer Vision and Pattern Recognition · Computer Science 2022-07-18 Jim Davis , Logan Frank

Robust learning with implicit residual networks

In this effort, we propose a new deep architecture utilizing residual blocks inspired by implicit discretization schemes. As opposed to the standard feed-forward networks, the outputs of the proposed implicit residual blocks are defined as…

Machine Learning · Computer Science 2021-02-23 Viktor Reshniak , Clayton Webster

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the…

Machine Learning · Computer Science 2021-11-25 Chen Zhu , Renkun Ni , Zheng Xu , Kezhi Kong , W. Ronny Huang , Tom Goldstein

When does mixup promote local linearity in learned representations?

Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of…

Machine Learning · Computer Science 2022-11-01 Arslan Chaudhry , Aditya Krishna Menon , Andreas Veit , Sadeep Jayasumana , Srikumar Ramalingam , Sanjiv Kumar