Related papers: Computationally efficient sparse clustering

Computational lower bounds in latent models: clustering, sparse-clustering, biclustering

In many high-dimensional problems, like sparse-PCA, planted clique, or clustering, the best known algorithms with polynomial time complexity fail to reach the statistical performance provably achievable by algorithms free of computational…

Statistics Theory · Mathematics 2025-06-17 Bertrand Even , Christophe Giraud , Nicolas Verzelen

Computational Lower Bounds for Sparse PCA

In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can…

Statistics Theory · Mathematics 2013-04-29 Quentin Berthet , Philippe Rigollet

Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings…

Machine Learning · Statistics 2013-06-11 Martin Azizyan , Aarti Singh , Larry Wasserman

Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction…

Machine Learning · Statistics 2023-04-04 Luca Pesce , Bruno Loureiro , Florent Krzakala , Lenka Zdeborová

Bayesian Sparse Gaussian Mixture Model in High Dimensions

We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound for parameter estimation is established, and we show that a constrained maximum…

Statistics Theory · Mathematics 2024-02-26 Dapeng Yao , Fangzheng Xie , Yanxun Xu

Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression

We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in…

Machine Learning · Computer Science 2024-06-26 Rares-Darius Buhai , Jingqiu Ding , Stefan Tiegel

Sparsity-aware Possibilistic Clustering Algorithms

In this paper two novel possibilistic clustering algorithms are presented, which utilize the concept of sparsity. The first one, called sparse possibilistic c-means, exploits sparsity and can deal well with closely located clusters that may…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Spyridoula D. Xenaki , Konstantinos D. Koutroumbas , Athanasios A. Rontogiannis

Clustering and Feature Selection using Sparse Principal Component Analysis

In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of…

Artificial Intelligence · Computer Science 2008-10-08 Ronny Luss , Alexandre d'Aspremont

Clustering under Perturbation Resilience

Motivated by the fact that distances between data points in many real-world clustering instances are often based on heuristic measures, Bilu and Linial~\cite{BL} proposed analyzing objective based clustering problems under the assumption…

Machine Learning · Computer Science 2016-12-13 Maria Florina Balcan , Yingyu Liang

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is…

Statistics Theory · Mathematics 2017-01-24 Jess Banks , Cristopher Moore , Nicolas Verzelen , Roman Vershynin , Jiaming Xu

Clustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis

Recently, there has been substantial interest in clustering research that takes a beyond worst-case approach to the analysis of algorithms. The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided…

Data Structures and Algorithms · Computer Science 2018-12-31 Maria-Florina Balcan , Colin White

Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent

Researchers currently use a number of approaches to predict and substantiate information-computation gaps in high-dimensional statistical estimation problems. A prominent approach is to characterize the limits of restricted models of…

Computational Complexity · Computer Science 2021-06-29 Matthew Brennan , Guy Bresler , Samuel B. Hopkins , Jerry Li , Tselil Schramm

Sparse Subspace Clustering: Algorithm, Theory, and Applications

In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures…

Computer Vision and Pattern Recognition · Computer Science 2013-02-06 Ehsan Elhamifar , Rene Vidal

Accuracy and Robustness of Clustering Algorithms for Small-Size Applications in Bioinformatics

The performance (accuracy and robustness) of several clustering algorithms is studied for linearly dependent random variables in the presence of noise. It turns out that the error percentage quickly increases when the number of observations…

Applications · Statistics 2009-11-13 Pamela Minicozzi , Fabio Rapallo , Enrico Scalas , Francesco Dondero

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly…

Machine Learning · Computer Science 2024-04-03 Andrew Draganov , David Saulpic , Chris Schwiegelshohn

Consistent spectral clustering in sparse tensor block models

High-order clustering aims to classify objects in multiway datasets that are prevalent in various fields such as bioinformatics, recommendation systems, and social network analysis. Such data are often sparse and high-dimensional, posing…

Statistics Theory · Mathematics 2025-12-05 Ian Välimaa , Lasse Leskelä

Phase Transitions for High Dimensional Clustering and Related Problems

Consider a two-class clustering problem where we observe $X_i = \ell_i \mu + Z_i$, $Z_i \stackrel{iid}{\sim} N(0, I_p)$, $1 \leq i \leq n$. The feature vector $\mu\in R^p$ is unknown but is presumably sparse. The class labels…

Statistics Theory · Mathematics 2016-06-09 Jiashun Jin , Zheng Tracy Ke , Wanjie Wang

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult task in cluster analysis. Many methods have been proposed to estimate K, including some top performers using resampling approach. When performing cluster analysis in…

Methodology · Statistics 2019-09-05 Yujia Li , Xiangrui Zeng , Chien-Wei Lin , George Tseng

Achieving stable subspace clustering by post-processing generic clustering results

We propose an effective subspace selection scheme as a post-processing step to improve results obtained by sparse subspace clustering (SSC). Our method starts by the computation of stable subspaces using a novel random sampling scheme. Thus…

Computer Vision and Pattern Recognition · Computer Science 2016-05-30 Duc-Son Pham , Ognjen Arandjelovic , Svetha Venkatesh

Analysis of Sparse Subspace Clustering: Experiments and Random Projection

Clustering can be defined as the process of assembling objects into a number of groups whose elements are similar to each other in some manner. As a technique that is used in many domains, such as face clustering, plant categorization,…

Machine Learning · Computer Science 2022-04-05 Mehmet F. Demirel , Enrico Au-Yeung