English

P-values for classification

Statistics Theory 2008-06-26 v3 Machine Learning Statistics Theory

Abstract

Let (X,Y)(X,Y) be a random variable consisting of an observed feature vector XXX\in \mathcal{X} and an unobserved class label Y{1,2,...,L}Y\in \{1,2,...,L\} with unknown joint distribution. In addition, let D\mathcal{D} be a training data set consisting of nn completely observed independent copies of (X,Y)(X,Y). Usual classification procedures provide point predictors (classifiers) Y^(X,D)\widehat{Y}(X,\mathcal{D}) of YY or estimate the conditional distribution of YY given XX. In order to quantify the certainty of classifying XX we propose to construct for each θ=1,2,...,L\theta =1,2,...,L a p-value πθ(X,D)\pi_{\theta}(X,\mathcal{D}) for the null hypothesis that Y=θY=\theta, treating YY temporarily as a fixed parameter. In other words, the point predictor Y^(X,D)\widehat{Y}(X,\mathcal{D}) is replaced with a prediction region for YY with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

Keywords

Cite

@article{arxiv.0801.2934,
  title  = {P-values for classification},
  author = {Lutz Duembgen and Bernd-Wolfgang Igl and Axel Munk},
  journal= {arXiv preprint arXiv:0801.2934},
  year   = {2008}
}

Comments

Published in at http://dx.doi.org/10.1214/08-EJS245 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

R2 v1 2026-06-21T10:04:23.110Z