English

Federated Principal Component Analysis

Machine Learning 2020-10-26 v3 Information Theory math.IT Machine Learning

Abstract

We present a federated, asynchronous, and (ε,δ)(\varepsilon, \delta)-differentially private algorithm for PCA in the memory-limited setting. Our algorithm incrementally computes local model updates using a streaming procedure and adaptively estimates its rr leading principal components when only O(dr)\mathcal{O}(dr) memory is available with dd being the dimensionality of the data. We guarantee differential privacy via an input-perturbation scheme in which the covariance matrix of a dataset XRd×n\mathbf{X} \in \mathbb{R}^{d \times n} is perturbed with a non-symmetric random Gaussian matrix with variance in O((dn)2logd)\mathcal{O}\left(\left(\frac{d}{n}\right)^2 \log d \right), thus improving upon the state-of-the-art. Furthermore, contrary to previous federated or distributed algorithms for PCA, our algorithm is also invariant to permutations in the incoming data, which provides robustness against straggler or failed nodes. Numerical simulations show that, while using limited-memory, our algorithm exhibits performance that closely matches or outperforms traditional non-federated algorithms, and in the absence of communication latency, it exhibits attractive horizontal scalability.

Keywords

Cite

@article{arxiv.1907.08059,
  title  = {Federated Principal Component Analysis},
  author = {Andreas Grammenos and Rodrigo Mendoza-Smith and Jon Crowcroft and Cecilia Mascolo},
  journal= {arXiv preprint arXiv:1907.08059},
  year   = {2020}
}

Comments

36 pages, 13 figures, 1 table. Accepted for publication at Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada