English

Generative Principal Component Analysis

Machine Learning 2022-09-08 v2 Information Theory Machine Learning math.IT Statistics Theory Statistics Theory

Abstract

In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an LL-Lipschitz continuous generative model with bounded kk-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order klogLm\sqrt{\frac{k\log L}{m}}, where mm is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.

Keywords

Cite

@article{arxiv.2203.09693,
  title  = {Generative Principal Component Analysis},
  author = {Zhaoqiang Liu and Jiulong Liu and Subhroshekhar Ghosh and Jun Han and Jonathan Scarlett},
  journal= {arXiv preprint arXiv:2203.09693},
  year   = {2022}
}

Comments

ICLR 2022 paper + additional appendix on algorithm-independent lower bounds + corrected experimental results for the Fashion-MNIST dataset

R2 v1 2026-06-24T10:17:51.569Z