Classification with High-Dimensional Sparse Samples

Dayu Huang; Sean Meyn

doi:10.1109/ISIT.2012.6283985

Classification with High-Dimensional Sparse Samples

Information Theory 2016-04-18 v3 math.IT Statistics Theory Statistics Theory

Authors: Dayu Huang , Sean Meyn

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

The task of the binary classification problem is to determine which of two distributions has generated a length- $n$ test sequence. The two distributions are unknown; two training sequences of length $N$ , one from each distribution, are observed. The distributions share an alphabet of size $m$ , which is significantly larger than $n$ and $N$ . How does $N,n,m$ affect the probability of classification error? We characterize the achievable error rate in a high-dimensional setting in which $N,n,m$ all tend to infinity, under the assumption that probability of any symbol is $O(m^{-1})$ . The results are: 1. There exists an asymptotically consistent classifier if and only if $m=o(\min\{N^2,Nn\})$ . This extends the previous consistency result in [1] to the case $N\neq n$ . 2. For the sparse sample case where $\max\{n,N\}=o(m)$ , finer results are obtained: The best achievable probability of error decays as $-\log(P_e)=J \min\{N^2, Nn\}(1+o(1))/m$ with $J>0$ . 3. A weighted coincidence-based classifier has non-zero generalized error exponent $J$ . 4. The $\ell_2$ -norm based classifier has J=0.

Keywords

group testing classification gaussian estimation

Cite

@article{arxiv.1202.1574,
  title  = {Classification with High-Dimensional Sparse Samples},
  author = {Dayu Huang and Sean Meyn},
  journal= {arXiv preprint arXiv:1202.1574},
  year   = {2016}
}

Comments

final draft submitted to ISIT 2012

Classification with High-Dimensional Sparse Samples

Abstract

Keywords

Cite

Comments

Related papers