Related papers: Reducing Neural Network Parameter Initialization I…

Data-driven Weight Initialization with Sylvester Solvers

In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions.…

Neural and Evolutionary Computing · Computer Science 2021-05-24 Debasmit Das , Yash Bhalgat , Fatih Porikli

Multilevel Initialization for Layer-Parallel Deep Neural Network Training

This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal…

Machine Learning · Computer Science 2019-12-20 Eric C. Cyr , Stefanie Günther , Jacob B. Schroder

Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy

Despite the recent success of stochastic gradient descent in deep learning, it is often difficult to train a deep neural network with an inappropriate choice of its initial parameters. Even if training is successful, it has been known that…

Machine Learning · Computer Science 2023-02-10 Cheolhyoung Lee , Kyunghyun Cho

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice,…

Machine Learning · Statistics 2019-10-31 Devansh Arpit , Victor Campos , Yoshua Bengio

On the Initialization of Long Short-Term Memory Networks

Weight initialization is important for faster convergence and stability of deep neural networks training. In this paper, a robust initialization method is developed to address the training instability in long short-term memory (LSTM)…

Machine Learning · Computer Science 2019-12-24 Mostafa Mehdipour Ghazi , Mads Nielsen , Akshay Pai , Marc Modat , M. Jorge Cardoso , Sebastien Ourselin , Lauge Sorensen

Deep activity propagation via weight initialization in spiking neural networks

Spiking Neural Networks (SNNs) and neuromorphic computing offer bio-inspired advantages such as sparsity and ultra-low power consumption, providing a promising alternative to conventional networks. However, training deep SNNs from scratch…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Aurora Micheli , Olaf Booij , Jan van Gemert , Nergis Tömen

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by…

Machine Learning · Computer Science 2021-04-21 Norman Mu , Zhewei Yao , Amir Gholami , Kurt Keutzer , Michael Mahoney

Robust Pruning at Initialization

Overparameterized Neural Networks (NN) display state-of-the-art performance. However, there is a growing need for smaller, energy-efficient, neural networks tobe able to use machine learning applications on devices with limited…

Machine Learning · Statistics 2021-05-21 Soufiane Hayou , Jean-Francois Ton , Arnaud Doucet , Yee Whye Teh

Initializing Models with Larger Ones

Weight initialization plays an important role in neural network training. Widely used initialization methods are proposed and evaluated for networks that are trained from scratch. However, the growing number of pretrained models now offers…

Machine Learning · Computer Science 2023-12-01 Zhiqiu Xu , Yanjie Chen , Kirill Vishniakov , Yida Yin , Zhiqiang Shen , Trevor Darrell , Lingjie Liu , Zhuang Liu

Neural network initialization with nonlinear characteristics and information on hierarchical features

Initialization of neural network parameters, such as weights and biases, has a crucial impact on learning performance; if chosen well, we can even avoid the need for additional training with backpropagation. For example, algorithms based on…

Machine Learning · Computer Science 2026-03-16 Hikaru Homma , Jun Ohkubo

Guiding Neural Network Initialization via Marginal Likelihood Maximization

We propose a simple, data-driven approach to help guide hyperparameter selection for neural network initialization. We leverage the relationship between neural network and Gaussian process models having corresponding activation and…

Machine Learning · Statistics 2020-12-21 Anthony S. Tai , Chunfeng Huang

Nonparametric Weight Initialization of Neural Networks via Integral Representation

A new initialization method for hidden parameters in a neural network is proposed. Derived from the integral representation of the neural network, a nonparametric probability distribution of hidden parameters is introduced. In this…

Machine Learning · Computer Science 2014-02-20 Sho Sonoda , Noboru Murata

Supervised level-wise pretraining for recurrent neural network initialization in multi-class classification

Recurrent Neural Networks (RNNs) can be seriously impacted by the initial parameters assignment, which may result in poor generalization performances on new unseen data. With the objective to tackle this crucial issue, in the context of RNN…

Machine Learning · Computer Science 2019-11-05 Dino Ienco , Roberto Interdonato , Raffaele Gaetano

Target noise: A pre-training based neural network initialization for efficient high resolution learning

Weight initialization plays a crucial role in the optimization behavior and convergence efficiency of neural networks. Most existing initialization methods, such as Xavier and Kaiming initializations, rely on random sampling and do not…

Machine Learning · Computer Science 2026-02-09 Shaowen Wang , Tariq Alkhalifah

Starting Positions Matter: A Study on Better Weight Initialization for Neural Network Quantization

Deep neural network (DNN) quantization for fast, efficient inference has been an important tool in limiting the cost of machine learning (ML) model inference. Quantization-specific model development techniques such as regularization,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Stone Yun , Alexander Wong

A Sober Look at Neural Network Initializations

Initializing the weights and the biases is a key part of the training process of a neural network. Unlike the subsequent optimization phase, however, the initialization phase has gained only limited attention in the literature. In this…

Machine Learning · Computer Science 2019-09-06 Ingo Steinwart

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models…

Machine Learning · Computer Science 2021-12-21 Karl Hajjar , Lénaïc Chizat , Christophe Giraud

MSE-Optimal Neural Network Initialization via Layer Fusion

Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders…

Machine Learning · Computer Science 2020-01-29 Ramina Ghods , Andrew S. Lan , Tom Goldstein , Christoph Studer

Initialization of ReLUs for Dynamical Isometry

Deep learning relies on good initialization schemes and hyperparameter choices prior to training a neural network. Random weight initializations induce random network ensembles, which give rise to the trainability, training speed, and…

Machine Learning · Statistics 2019-10-25 Rebekka Burkholz , Alina Dubatovka

Weights initialization of neural networks for function approximation

Neural network-based function approximation plays a pivotal role in the advancement of scientific computing and machine learning. Yet, training such models faces several challenges: (i) each target function often requires training a new…

Machine Learning · Computer Science 2025-10-13 Xinwen Hu , Yunqing Huang , Nianyu Yi , Peimeng Yin