English

Sparse PCA with False Discovery Rate Controlled Variable Selection

Machine Learning 2024-01-17 v1 Machine Learning

Abstract

Sparse principal component analysis (PCA) aims at mapping large dimensional data to a linear subspace of lower dimension. By imposing loading vectors to be sparse, it performs the double duty of dimension reduction and variable selection. Sparse PCA algorithms are usually expressed as a trade-off between explained variance and sparsity of the loading vectors (i.e., number of selected variables). As a high explained variance is not necessarily synonymous with relevant information, these methods are prone to select irrelevant variables. To overcome this issue, we propose an alternative formulation of sparse PCA driven by the false discovery rate (FDR). We then leverage the Terminating-Random Experiments (T-Rex) selector to automatically determine an FDR-controlled support of the loading vectors. A major advantage of the resulting T-Rex PCA is that no sparsity parameter tuning is required. Numerical experiments and a stock market data example demonstrate a significant performance improvement.

Keywords

Cite

@article{arxiv.2401.08375,
  title  = {Sparse PCA with False Discovery Rate Controlled Variable Selection},
  author = {Jasin Machkour and Arnaud Breloy and Michael Muma and Daniel P. Palomar and Frédéric Pascal},
  journal= {arXiv preprint arXiv:2401.08375},
  year   = {2024}
}

Comments

Published in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), scheduled for 14-19 April 2024 in Seoul, Korea