English

Computing Approximate Statistical Discrepancy

Computational Geometry 2018-10-01 v3

Abstract

Consider a geometric range space (X,A¸)(X,\c{A}) where each data point xXx \in X has two or more values (say r(x)r(x) and b(x)b(x)). Also consider a function Φ(A)\Phi(A) defined on any subset A(X,A¸)A \in (X,\c{A}) on the sum of values in that range e.g., rA=xAr(x)r_A = \sum_{x \in A} r(x) and bA=xAb(x)b_A = \sum_{x \in A} b(x). The Φ\Phi-maximum range is A=argmaxA(X,A¸)Φ(A)A^* = \arg \max_{A \in (X,\c{A})} \Phi(A). Our goal is to find some A^\hat{A} such that Φ(A^)Φ(A)ε.|\Phi(\hat{A}) - \Phi(A^*)| \leq \varepsilon. We develop algorithms for this problem for range spaces with bounded VC-dimension, as well as significant improvements for those defined by balls, halfspaces, and axis-aligned rectangles. This problem has many applications in many areas including discrepancy evaluation, classification, and spatial scan statistics.

Keywords

Cite

@article{arxiv.1804.11287,
  title  = {Computing Approximate Statistical Discrepancy},
  author = {Michael Matheny and Jeff M. Phillips},
  journal= {arXiv preprint arXiv:1804.11287},
  year   = {2018}
}