Dropout Training as Adaptive Regularization

Stefan Wager; Sida Wang; Percy Liang

Dropout Training as Adaptive Regularization

Machine Learning 2013-11-04 v2 Machine Learning Methodology

Authors: Stefan Wager , Sida Wang , Percy Liang

Abstract

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.

Keywords

regularization in machine learning online learning deep learning

Cite

@article{arxiv.1307.1493,
  title  = {Dropout Training as Adaptive Regularization},
  author = {Stefan Wager and Sida Wang and Percy Liang},
  journal= {arXiv preprint arXiv:1307.1493},
  year   = {2013}
}

Comments

11 pages. Advances in Neural Information Processing Systems (NIPS), 2013

Dropout Training as Adaptive Regularization

Abstract

Keywords

Cite

Comments

Related papers