English

Efficient Sparse PCA via Block-Diagonalization

Machine Learning 2025-03-06 v2 Optimization and Control Machine Learning

Abstract

Sparse Principal Component Analysis (Sparse PCA) is a pivotal tool in data analysis and dimensionality reduction. However, Sparse PCA is a challenging problem in both theory and practice: it is known to be NP-hard and current exact methods generally require exponential runtime. In this paper, we propose a novel framework to efficiently approximate Sparse PCA by (i) approximating the general input covariance matrix with a re-sorted block-diagonal matrix, (ii) solving the Sparse PCA sub-problem in each block, and (iii) reconstructing the solution to the original problem. Our framework is simple and powerful: it can leverage any off-the-shelf Sparse PCA algorithm and achieve significant computational speedups, with a minor additive error that is linear in the approximation error of the block-diagonal matrix. Suppose g(k,d)g(k, d) is the runtime of an algorithm (approximately) solving Sparse PCA in dimension dd and with sparsity constant kk. Our framework, when integrated with this algorithm, reduces the runtime to O(ddg(k,d)+d2)\mathcal{O}\left(\frac{d}{d^\star} \cdot g(k, d^\star) + d^2\right), where ddd^\star \leq d is the largest block size of the block-diagonal matrix. For instance, integrating our framework with the Branch-and-Bound algorithm reduces the complexity from g(k,d)=O(k3dk)g(k, d) = \mathcal{O}(k^3\cdot d^k) to O(k3d(d)k1)\mathcal{O}(k^3\cdot d \cdot (d^\star)^{k-1}), demonstrating exponential speedups if dd^\star is small. We perform large-scale evaluations on many real-world datasets: for exact Sparse PCA algorithm, our method achieves an average speedup factor of 100.50, while maintaining an average approximation error of 0.61%; for approximate Sparse PCA algorithm, our method achieves an average speedup factor of 6.00 and an average approximation error of -0.91%, meaning that our method oftentimes finds better solutions.

Keywords

Cite

@article{arxiv.2410.14092,
  title  = {Efficient Sparse PCA via Block-Diagonalization},
  author = {Alberto Del Pia and Dekun Zhou and Yinglun Zhu},
  journal= {arXiv preprint arXiv:2410.14092},
  year   = {2025}
}

Comments

29 pages, 1 figure