Related papers: Communication-efficient Algorithms for Distributed…

Distributed Estimation of Principal Eigenspaces

Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication…

Computation · Statistics 2018-01-11 Jianqing Fan , Dong Wang , Kaizheng Wang , Ziwei Zhu

Communication-efficient distributed eigenspace estimation

Distributed computing is a standard way to scale up machine learning and data science algorithms to process large amounts of data. In such settings, avoiding communication amongst machines is paramount for achieving high performance. Rather…

Machine Learning · Statistics 2021-05-04 Vasileios Charisopoulos , Austin R. Benson , Anil Damle

Distributed Estimation for Principal Component Analysis: an Enlarged Eigenspace Analysis

The growing size of modern data sets brings many challenges to the existing statistical estimation approaches, which calls for new distributed methodologies. This paper studies distributed estimation for a fundamental statistical machine…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-04 Xi Chen , Jason D. Lee , He Li , Yun Yang

Communication Efficient Distributed Kernel Principal Component Analysis

Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very…

Machine Learning · Computer Science 2016-02-16 Maria-Florina Balcan , Yingyu Liang , Le Song , David Woodruff , Bo Xie

Distributed Principal Component Analysis with Limited Communication

We study efficient distributed algorithms for the fundamental problem of principal component analysis and leading eigenvector computation on the sphere, when the data are randomly distributed among a set of computational nodes. We propose a…

Optimization and Control · Mathematics 2021-10-28 Foivos Alimisis , Peter Davies , Bart Vandereycken , Dan Alistarh

Improved Distributed Principal Component Analysis

We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA),…

Machine Learning · Computer Science 2014-12-24 Maria-Florina Balcan , Vandana Kanchanapally , Yingyu Liang , David Woodruff

Few-Round Distributed Principal Component Analysis: Closing the Statistical Efficiency Gap by Consensus

Distributed algorithms and theories are called for in this era of big data. Under weaker local signal-to-noise ratios, we improve upon the celebrated one-round distributed principal component analysis (PCA) algorithm designed in the spirit…

Methodology · Statistics 2025-07-01 ZeYu Li , Xinsheng Zhang , Wang Zhou

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction…

Machine Learning · Computer Science 2023-01-25 Arpita Gang , Waheed U. Bajwa

A Linearly Convergent Algorithm for Distributed Principal Component Analysis

Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are…

Machine Learning · Computer Science 2021-11-30 Arpita Gang , Waheed U. Bajwa

Online Distributed Estimation of Principal Eigenspaces

Principal components analysis (PCA) is a widely used dimension reduction technique with an extensive range of applications. In this paper, an online distributed algorithm is proposed for recovering the principal eigenspaces. We further…

Machine Learning · Statistics 2019-05-20 Davoud Ataee Tarzanagh , Mohamad Kazem Shirani Faradonbeh , George Michailidis

Non-negative Principal Component Analysis: Message Passing Algorithms and Sharp Asymptotics

Principal component analysis (PCA) aims at estimating the direction of maximal variability of a high-dimensional dataset. A natural question is: does this task become easier, and estimation more accurate, when we exploit additional…

Information Theory · Computer Science 2014-06-19 Andrea Montanari , Emile Richard

Efficient Distributed Estimation of Inverse Covariance Matrices

In distributed systems, communication is a major concern due to issues such as its vulnerability or efficiency. In this paper, we are interested in estimating sparse inverse covariance matrices when samples are distributed into different…

Methodology · Statistics 2016-10-04 Jesús Arroyo , Elizabeth Hou

Distributed Principal Component Analysis for Wireless Sensor Networks

The Principal Component Analysis (PCA) is a data dimensionality reduction technique well-suited for processing data from sensor networks. It can be applied to tasks like compression, event detection, and event recognition. This technique is…

Networking and Internet Architecture · Computer Science 2010-03-13 Yann-Aël Le Borgne , Sylvain Raybaud , Gianluca Bontempi

Robust covariance estimation for distributed principal component analysis

Fan et al. [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{47}$(6) (2019) 3009-3031] constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers…

Statistics Theory · Mathematics 2021-10-07 Kangqiang Li , Han Bao , Lixin Zhang

Distributed Robust Principal Component Analysis

We study the robust principal component analysis (RPCA) problem in a distributed setting. The goal of RPCA is to find an underlying low-rank estimation for a raw data matrix when the data matrix is subject to the corruption of gross sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-16 Wenda Chu

Distributed Learning for Principle Eigenspaces without Moment Constraints

Distributed Principal Component Analysis (PCA) has been studied to deal with the case when data are stored across multiple machines and communication cost or privacy concerns prohibit the computation of PCA in a central location. However,…

Computation · Statistics 2022-05-02 Yong He , Zichen Liu , Yalin Wang

Quantifying the Estimation Error of Principal Components

Principal component analysis is an important pattern recognition and dimensionality reduction tool in many applications. Principal components are computed as eigenvectors of a maximum likelihood covariance $\widehat{\Sigma}$ that…

Statistics Theory · Mathematics 2017-10-30 Raphael Hauser , Raul Kangro , Jüri Lember , Heinrich Matzinger

Principal Component Analysis and Higher Correlations for Distributed Data

We consider algorithmic problems in the setting in which the input data has been partitioned arbitrarily on many servers. The goal is to compute a function of all the data, and the bottleneck is the communication used by the algorithm. We…

Data Structures and Algorithms · Computer Science 2014-07-01 Ravindran Kannan , Santosh Vempala , David Woodruff

Two derivations of Principal Component Analysis on datasets of distributions

In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived…

Machine Learning · Statistics 2023-06-26 Vlad Niculae

Covariance matrix preparation for quantum principal component analysis

Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a…

Quantum Physics · Physics 2022-10-26 Max Hunter Gordon , M. Cerezo , Lukasz Cincio , Patrick J. Coles