Related papers: Isotropic PCA and Affine-Invariant Clustering

Robustly Clustering a Mixture of Gaussians

We give an efficient algorithm for robustly clustering of a mixture of two arbitrary Gaussians, a central open problem in the theory of computationally efficient robust estimation, assuming only that the the means of the component Gaussians…

Data Structures and Algorithms · Computer Science 2020-06-02 He Jia , Santosh Vempala

Efficient Clustering for Stretched Mixtures: Landscape and Optimality

This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and…

Machine Learning · Statistics 2021-11-30 Kaizheng Wang , Yuling Yan , Mateo Díaz

Subspace clustering of high-dimensional data: a predictive approach

In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a…

Machine Learning · Statistics 2012-03-07 Brian McWilliams , Giovanni Montana

Thresholding based Efficient Outlier Robust PCA

We consider the problem of outlier robust PCA (OR-PCA) where the goal is to recover principal directions despite the presence of outlier data points. That is, given a data matrix $M^*$, where $(1-\alpha)$ fraction of the points are noisy…

Machine Learning · Computer Science 2017-02-21 Yeshwanth Cherapanamjeri , Prateek Jain , Praneeth Netrapalli

Spectral Clustering Based on Local PCA

We propose a spectral clustering method based on local principal components analysis (PCA). After performing local PCA in selected neighborhoods, the algorithm builds a nearest neighbor graph weighted according to a discrepancy between the…

Machine Learning · Statistics 2019-04-09 Ery Arias-Castro , Gilad Lerman , Teng Zhang

Fourier PCA and Robust Tensor Decomposition

Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution.We make this method algorithmic by developing a tensor decomposition method for a…

Machine Learning · Computer Science 2014-07-01 Navin Goyal , Santosh Vempala , Ying Xiao

Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation

We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i…

Machine Learning · Computer Science 2023-12-20 Ilias Diakonikolas , Daniel M. Kane , Jasper C. H. Lee , Thanasis Pittas

Solving clustering as ill-posed problem: experiments with K-Means algorithm

In this contribution, the clustering procedure based on K-Means algorithm is studied as an inverse problem, which is a special case of the illposed problems. The attempts to improve the quality of the clustering inverse problem drive to…

Numerical Analysis · Mathematics 2022-11-16 Alberto Arturo Vergani

An Affine-Invariant Bayesian Cluster Process

In order to identify clusters of objects with features transformed by unknown affine transformations, we develop a Bayesian cluster process which is invariant with respect to certain linear transformations of the feature space and able to…

Methodology · Statistics 2016-12-01 Hsin-Hsiung Huang , Jie Yang

Multiscale principal component analysis

Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting…

Methodology · Statistics 2015-06-16 A. A. Akinduko , A. N. Gorban

Few-Round Distributed Principal Component Analysis: Closing the Statistical Efficiency Gap by Consensus

Distributed algorithms and theories are called for in this era of big data. Under weaker local signal-to-noise ratios, we improve upon the celebrated one-round distributed principal component analysis (PCA) algorithm designed in the spirit…

Methodology · Statistics 2025-07-01 ZeYu Li , Xinsheng Zhang , Wang Zhou

Influential Feature PCA for high dimensional clustering

We consider a clustering problem where we observe feature vectors $X_i \in R^p$, $i = 1, 2, \ldots, n$, from $K$ possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the…

Methodology · Statistics 2015-12-17 Jiashun Jin , Wanjie Wang

Principal Component Analysis in Space Forms

Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more…

Machine Learning · Statistics 2024-07-11 Puoya Tabaghi , Michael Khanzadeh , Yusu Wang , Sivash Mirarab

Robustly Learning any Clusterable Mixture of Gaussians

We study the efficient learnability of high-dimensional Gaussian mixtures in the outlier-robust setting, where a small constant fraction of the data is adversarially corrupted. We resolve the polynomial learnability of this problem when the…

Data Structures and Algorithms · Computer Science 2020-05-14 Ilias Diakonikolas , Samuel B. Hopkins , Daniel Kane , Sushrut Karmalkar

Communication-efficient distributed eigenspace estimation

Distributed computing is a standard way to scale up machine learning and data science algorithms to process large amounts of data. In such settings, avoiding communication amongst machines is paramount for achieving high performance. Rather…

Machine Learning · Statistics 2021-05-04 Vasileios Charisopoulos , Austin R. Benson , Anil Damle

Fourier-Bessel rotational invariant eigenimages

We present an efficient and accurate algorithm for principal component analysis (PCA) of a large set of two dimensional images, and, for each image, the set of its uniform rotations in the plane and its reflection. The algorithm starts by…

Computer Vision and Pattern Recognition · Computer Science 2014-02-17 Zhizhen Zhao , Amit Singer

A random version of principal component analysis in data clustering

Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance-correlation matrix of the analyzed data. However to properly work with high-dimensional data, PCA poses severe mathematical…

Quantitative Methods · Quantitative Biology 2018-10-18 Luigi Leonardo Palese

Fair Clustering via Alignment

Algorithmic fairness in clustering aims to balance the proportions of instances assigned to each cluster with respect to a given sensitive attribute. While recently developed fair clustering algorithms optimize clustering objectives under…

Machine Learning · Computer Science 2025-10-24 Kunwoong Kim , Jihu Lee , Sangchul Park , Yongdai Kim

Deep Transformation-Invariant Clustering

Recent advances in image clustering typically focus on learning better deep representations. In contrast, we present an orthogonal approach that does not rely on abstract features but instead learns to predict image transformations and…

Computer Vision and Pattern Recognition · Computer Science 2020-10-29 Tom Monnier , Thibault Groueix , Mathieu Aubry

Leveraging Union of Subspace Structure to Improve Constrained Clustering

Many clustering problems in computer vision and other contexts are also classification problems, where each cluster shares a meaningful label. Subspace clustering algorithms in particular are often applied to problems that fit this…

Machine Learning · Computer Science 2017-09-15 John Lipor , Laura Balzano