English

Data-driven Weight Initialization with Sylvester Solvers

Neural and Evolutionary Computing 2021-05-24 v1 Computer Vision and Pattern Recognition Machine Learning

Abstract

In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions. Such methods do not use the training data to produce a more informed initialization. Our method uses a sequential layer-wise approach where each layer is initialized using its input activations. The initialization is cast as an optimization problem where we minimize a combination of encoding and decoding losses of the input activations, which is further constrained by a user-defined latent code. The optimization problem is then restructured into the well-known Sylvester equation, which has fast and efficient gradient-free solutions. Our data-driven method achieves a boost in performance compared to random initialization methods, both before start of training and after training is over. We show that our proposed method is especially effective in few-shot and fine-tuning settings. We conclude this paper with analyses on time complexity and the effect of different latent codes on the recognition performance.

Keywords

Cite

@article{arxiv.2105.10335,
  title  = {Data-driven Weight Initialization with Sylvester Solvers},
  author = {Debasmit Das and Yash Bhalgat and Fatih Porikli},
  journal= {arXiv preprint arXiv:2105.10335},
  year   = {2021}
}

Comments

Practical Machine Learning for Developing Countries Workshop, International Conference on Learning Representations, 2021

R2 v1 2026-06-24T02:20:27.801Z