Automatic sparse PCA for high-dimensional data

Kazuyoshi Yata; Makoto Aoshima

Automatic sparse PCA for high-dimensional data

Methodology 2023-05-29 v2 Statistics Theory Statistics Theory

Authors: Kazuyoshi Yata , Makoto Aoshima

Abstract

Sparse principal component analysis (SPCA) methods have proven to efficiently analyze high-dimensional data. Among them, threshold-based SPCA (TSPCA) is computationally more cost-effective than regularized SPCA, based on L1 penalties. We herein present an investigation of the efficacy of TSPCA for high-dimensional data settings and illustrate that, for a suitable threshold value, TSPCA achieves satisfactory performance for high-dimensional data. Thus, the performance of the TSPCA depends heavily on the selected threshold value. To this end, we propose a novel thresholding estimator to obtain the principal component (PC) directions using a customized noise-reduction methodology. The proposed technique is consistent under mild conditions, unaffected by threshold values, and therefore yields more accurate results quickly at a lower computational cost. Furthermore, we explore the shrinkage PC directions and their application in clustering high-dimensional data. Finally, we evaluate the performance of the estimated shrinkage PC directions in actual data analyses.

Keywords

principal component analysis discriminant analysis and canonical correlation analysis sparse signal estimation

Cite

@article{arxiv.2209.14891,
  title  = {Automatic sparse PCA for high-dimensional data},
  author = {Kazuyoshi Yata and Makoto Aoshima},
  journal= {arXiv preprint arXiv:2209.14891},
  year   = {2023}
}

Automatic sparse PCA for high-dimensional data

Abstract

Keywords

Cite

Related papers