English

Testing Convex Truncation

Data Structures and Algorithms 2024-11-25 v2 Computational Complexity Probability Statistics Theory Statistics Theory

Abstract

We study the basic statistical problem of testing whether normally distributed nn-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set SRnS \subseteq \mathbb{R}^n. As our main algorithmic results, (1) We give a computationally efficient O(n)O(n)-sample algorithm that can distinguish the standard normal distribution N(0,In)N(0,I_n) from N(0,In)N(0,I_n) conditioned on an unknown and arbitrary convex set SS. (2) We give a different computationally efficient O(n)O(n)-sample algorithm that can distinguish N(0,In)N(0,I_n) from N(0,In)N(0,I_n) conditioned on an unknown and arbitrary mixture of symmetric convex sets. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially nnn^{\sqrt{n}} samples. An easy argument shows that no finite number of samples suffices to distinguish N(0,In)N(0,I_n) from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish N(0,In)N(0,I_n) from N(0,In)N(0,I_n) conditioned on an unknown symmetric convex set must use Ω(n)\Omega(n) samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

Keywords

Cite

@article{arxiv.2305.03146,
  title  = {Testing Convex Truncation},
  author = {Anindya De and Shivam Nadimpalli and Rocco A. Servedio},
  journal= {arXiv preprint arXiv:2305.03146},
  year   = {2024}
}

Comments

Preliminary version in SODA 2023; v3 includes a simpler and stronger lower bound than v2. 26 pages