Classification with High-Dimensional Sparse Samples
Abstract
The task of the binary classification problem is to determine which of two distributions has generated a length- test sequence. The two distributions are unknown; two training sequences of length , one from each distribution, are observed. The distributions share an alphabet of size , which is significantly larger than and . How does affect the probability of classification error? We characterize the achievable error rate in a high-dimensional setting in which all tend to infinity, under the assumption that probability of any symbol is . The results are: 1. There exists an asymptotically consistent classifier if and only if . This extends the previous consistency result in [1] to the case . 2. For the sparse sample case where , finer results are obtained: The best achievable probability of error decays as with . 3. A weighted coincidence-based classifier has non-zero generalized error exponent . 4. The -norm based classifier has J=0.
Cite
@article{arxiv.1202.1574,
title = {Classification with High-Dimensional Sparse Samples},
author = {Dayu Huang and Sean Meyn},
journal= {arXiv preprint arXiv:1202.1574},
year = {2016}
}
Comments
final draft submitted to ISIT 2012