High-dimensional estimation with missing data: Statistical and computational limits

Kabir Aladin Verchand; Ankit Pensia; Saminul Haque; Rohith Kuditipudi

High-dimensional estimation with missing data: Statistical and computational limits

Statistics Theory 2026-03-18 v1 Data Structures and Algorithms Machine Learning Machine Learning Statistics Theory

Authors: Kabir Aladin Verchand , Ankit Pensia , Saminul Haque , Rohith Kuditipudi

Abstract

We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an $\epsilon$ fraction of the observations are subject to an arbitrary (and unknown) missing not at random (MNAR) mechanism. When the true data is Gaussian, we provide evidence towards statistical-computational gaps in several problems. For mean estimation in $\ell_2$ norm, we show that in order to obtain error at most $\rho$ , for any constant contamination $\epsilon \in (0, 1)$ , (roughly) $n \gtrsim d e^{1/\rho^2}$ samples are necessary and that there is a computationally-inefficient algorithm which achieves this error. On the other hand, we show that any computationally-efficient method within certain popular families of algorithms requires a much larger sample complexity of (roughly) $n \gtrsim d^{1/\rho^2}$ and that there exists a polynomial time algorithm based on sum-of-squares which (nearly) achieves this lower bound. For covariance estimation in relative operator norm, we show that a parallel development holds. Finally, we turn to linear regression with missing observations and show that such a gap does not persist. Indeed, in this setting we show that minimizing a simple, strongly convex empirical risk nearly achieves the information-theoretic lower bound in polynomial time.

Keywords

covariance estimation statistical estimation nonparametric regression

Cite

@article{arxiv.2603.16712,
  title  = {High-dimensional estimation with missing data: Statistical and computational limits},
  author = {Kabir Aladin Verchand and Ankit Pensia and Saminul Haque and Rohith Kuditipudi},
  journal= {arXiv preprint arXiv:2603.16712},
  year   = {2026}
}

High-dimensional estimation with missing data: Statistical and computational limits

Abstract

Keywords

Cite

Related papers