Sparse PCA with False Discovery Rate Controlled Variable Selection
Abstract
Sparse principal component analysis (PCA) aims at mapping large dimensional data to a linear subspace of lower dimension. By imposing loading vectors to be sparse, it performs the double duty of dimension reduction and variable selection. Sparse PCA algorithms are usually expressed as a trade-off between explained variance and sparsity of the loading vectors (i.e., number of selected variables). As a high explained variance is not necessarily synonymous with relevant information, these methods are prone to select irrelevant variables. To overcome this issue, we propose an alternative formulation of sparse PCA driven by the false discovery rate (FDR). We then leverage the Terminating-Random Experiments (T-Rex) selector to automatically determine an FDR-controlled support of the loading vectors. A major advantage of the resulting T-Rex PCA is that no sparsity parameter tuning is required. Numerical experiments and a stock market data example demonstrate a significant performance improvement.
Keywords
Cite
@article{arxiv.2401.08375,
title = {Sparse PCA with False Discovery Rate Controlled Variable Selection},
author = {Jasin Machkour and Arnaud Breloy and Michael Muma and Daniel P. Palomar and Frédéric Pascal},
journal= {arXiv preprint arXiv:2401.08375},
year = {2024}
}
Comments
Published in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), scheduled for 14-19 April 2024 in Seoul, Korea