English

Feature selection in omics prediction problems using cat scores and false nondiscovery rate control

Applications 2010-10-11 v4 Machine Learning

Abstract

We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobis-transformed predictors are given by correlation-adjusted tt-scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). Third, training of the classifier is based on James--Stein shrinkage estimates of correlations and variances, where regularization parameters are chosen analytically without resampling. Overall, this results in an effective and computationally inexpensive framework for high-dimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package ``sda'' available from the R repository CRAN.

Keywords

Cite

@article{arxiv.0903.2003,
  title  = {Feature selection in omics prediction problems using cat scores and false nondiscovery rate control},
  author = {Miika Ahdesmäki and Korbinian Strimmer},
  journal= {arXiv preprint arXiv:0903.2003},
  year   = {2010}
}

Comments

Published in at http://dx.doi.org/10.1214/09-AOAS277 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

R2 v1 2026-06-21T12:20:46.172Z