P-values for classification

Lutz Duembgen; Bernd-Wolfgang Igl; Axel Munk

doi:10.1214/08-EJS245

P-values for classification

Statistics Theory 2008-06-26 v3 Machine Learning Statistics Theory

Authors: Lutz Duembgen , Bernd-Wolfgang Igl , Axel Munk

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Let $(X,Y)$ be a random variable consisting of an observed feature vector $X\in \mathcal{X}$ and an unobserved class label $Y\in \{1,2,...,L\}$ with unknown joint distribution. In addition, let $\mathcal{D}$ be a training data set consisting of $n$ completely observed independent copies of $(X,Y)$ . Usual classification procedures provide point predictors (classifiers) $\widehat{Y}(X,\mathcal{D})$ of $Y$ or estimate the conditional distribution of $Y$ given $X$ . In order to quantify the certainty of classifying $X$ we propose to construct for each $\theta =1,2,...,L$ a p-value $\pi_{\theta}(X,\mathcal{D})$ for the null hypothesis that $Y=\theta$ , treating $Y$ temporarily as a fixed parameter. In other words, the point predictor $\widehat{Y}(X,\mathcal{D})$ is replaced with a prediction region for $Y$ with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

Keywords

statistical inference model selection hypothesis testing

Cite

@article{arxiv.0801.2934,
  title  = {P-values for classification},
  author = {Lutz Duembgen and Bernd-Wolfgang Igl and Axel Munk},
  journal= {arXiv preprint arXiv:0801.2934},
  year   = {2008}
}

Comments

Published in at http://dx.doi.org/10.1214/08-EJS245 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

P-values for classification

Abstract

Keywords

Cite

Comments

Related papers