English

Boosting for high-dimensional linear models

Statistics Theory 2016-08-16 v1 Statistics Theory

Abstract

We prove that boosting with the squared error loss, L2L_2Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as OO(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the 1\ell_1-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the 1\ell_1-norm. We also propose here an AIC\mathit{AIC}-based method for tuning, namely for choosing the number of boosting iterations. This makes L2L_2Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L2L_2Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

Keywords

Cite

@article{arxiv.math/0606789,
  title  = {Boosting for high-dimensional linear models},
  author = {Peter Bühlmann},
  journal= {arXiv preprint arXiv:math/0606789},
  year   = {2016}
}

Comments

Published at http://dx.doi.org/10.1214/009053606000000092 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)