English

A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation

Data Structures and Algorithms 2024-03-08 v1 Machine Learning Statistics Theory Machine Learning Statistics Theory

Abstract

We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers. Specifically, the algorithm observes a \emph{corrupted} set of samples from N(μ,Id)\mathcal{N}(\mu,\mathbf{I}_d), where the unknown mean μRd\mu \in \mathbb{R}^d is constrained to be kk-sparse. A series of prior works has developed efficient algorithms for robust sparse mean estimation with sample complexity poly(k,logd,1/ϵ)\mathrm{poly}(k,\log d, 1/\epsilon) and runtime d2poly(k,logd,1/ϵ)d^2 \mathrm{poly}(k,\log d,1/\epsilon), where ϵ\epsilon is the fraction of contamination. In particular, the fastest runtime of existing algorithms is quadratic (Ω(d2)\Omega(d^2)), which can be prohibitive in high dimensions. This quadratic barrier in the runtime stems from the reliance of these algorithms on the sample covariance matrix, which is of size d2d^2. Our main contribution is an algorithm for robust sparse mean estimation which runs in \emph{subquadratic} time using poly(k,logd,1/ϵ)\mathrm{poly}(k,\log d,1/\epsilon) samples. We also provide analogous results for robust sparse PCA. Our results build on algorithmic advances in detecting weak correlations, a generalized version of the light-bulb problem by Valiant.

Keywords

Cite

@article{arxiv.2403.04726,
  title  = {A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation},
  author = {Ankit Pensia},
  journal= {arXiv preprint arXiv:2403.04726},
  year   = {2024}
}