Boosting for high-dimensional linear models

Peter Bühlmann

doi:10.1214/009053606000000092

Boosting for high-dimensional linear models

Statistics Theory 2016-08-16 v1 Statistics Theory

Authors: Peter Bühlmann

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We prove that boosting with the squared error loss, $L_2$ Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as $O$ (exp(sample size)), assuming that the true underlying regression function is sparse in terms of the $\ell_1$ -norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the $\ell_1$ -norm. We also propose here an $\mathit{AIC}$ -based method for tuning, namely for choosing the number of boosting iterations. This makes $L_2$ Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate $L_2$ Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

Keywords

model selection statistical inference signal detection

Cite

@article{arxiv.math/0606789,
  title  = {Boosting for high-dimensional linear models},
  author = {Peter Bühlmann},
  journal= {arXiv preprint arXiv:math/0606789},
  year   = {2016}
}

Comments

Published at http://dx.doi.org/10.1214/009053606000000092 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Boosting for high-dimensional linear models

Abstract

Keywords

Cite

Comments

Related papers