Related papers: P-values for classification

Nonparametric inference for $P(X<Y)$ with paired variables

We propose two classes of nonparametric point estimators of $\theta=P(X<Y)$ in the case where $(X,Y)$ are paired, possibly dependent, absolutely continuous random variables. The proposed estimators are based on nonparametric estimators of…

Methodology · Statistics 2013-03-27 J. A. Montoya , F. J. Rubio

p-Values for Model Evaluation

Deciding whether a model provides a good description of data is often based on a goodness-of-fit criterion summarized by a p-value. Although there is considerable confusion concerning the meaning of p-values, leading to their misuse, they…

Data Analysis, Statistics and Probability · Physics 2013-05-29 Frederik Beaujean , Allen Caldwell , Daniel Kollar , Kevin Kroeninger

Divergence vs. Decision P-values: A Distinction Worth Making in Theory and Keeping in Practice

There are two distinct definitions of 'P-value' for evaluating a proposed hypothesis or model for the process generating an observed dataset. The original definition starts with a measure of the divergence of the dataset from what was…

Other Statistics · Statistics 2023-09-25 Sander Greenland

Invariant $P$-values for model checking

$P$-values have been the focus of considerable criticism based on various considerations. Still, the $P$-value represents one of the most commonly used statistical tools. When assessing the suitability of a single hypothesized distribution,…

Statistics Theory · Mathematics 2010-01-13 Michael Evans , Gun Ho Jang

Post-selection inference for quantifying uncertainty in changes in variance

Quantifying uncertainty in detected changepoints is an important problem. However it is challenging as the naive approach would use the data twice, first to detect the changes, and then to test them. This will bias the test, and can lead to…

Methodology · Statistics 2026-05-11 Rachel Carrington , Paul Fearnhead

Evaluating Black-Box Classifiers via Stable Adaptive Two-Sample Inference

We consider the problem of evaluating black-box multi-class classifiers. In the standard setup, we observe class labels $Y\in \{0,1,\ldots,M-1\}$ generated according to the conditional distribution $ Y|X \sim \text{…

Methodology · Statistics 2026-04-08 Yuchen Chen , Jing Lei

Repeated Observations for Classification

We study the problem nonparametric classification with repeated observations. Let $\bX$ be the $d$ dimensional feature vector and let $Y$ denote the label taking values in $\{1,\dots ,M\}$. In contrast to usual setup with large sample size…

Information Theory · Computer Science 2023-07-20 Hüseyin Afşer , László Györfi , Harro Walk

Statistical significance in high-dimensional linear models

We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all…

Methodology · Statistics 2013-10-14 Peter Bühlmann

Predictive Value Generalization Bounds

In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification. The positive and negative predictive values (ppv and npv, respectively) are conditional probabilities of the true…

Machine Learning · Statistics 2020-07-13 Keshav Vemuri , Nathan Srebro

Testing for parameter change in general integer-valued time series

We consider the structural change in a class of discrete valued time series that the conditional distribution follows a one-parameter exponential family. We propose a change-point test based on the maximum likelihood estimator of the…

Statistics Theory · Mathematics 2016-03-01 Mamadou Lamine Diop , William Kengne

Sequential Specification Tests to Choose a Model: A Change-Point Approach

Researchers faced with a sequence of candidate model specifications must often choose the best specification that does not violate a testable identification assumption. One option in this scenario is sequential specification tests:…

Methodology · Statistics 2023-07-25 Adam C. Sales

Evaluating model calibration in classification

Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their…

Machine Learning · Computer Science 2019-02-20 Juozas Vaicenavicius , David Widmann , Carl Andersson , Fredrik Lindsten , Jacob Roll , Thomas B. Schön

Les p-values comme votes d'experts

The p-values are often implicitly used as a measure of evidence for the hypotheses of the tests. This practice has been analyzed with different approaches. It is generally accepted for the one-sided hypothesis problem, but it is often…

Statistics Theory · Mathematics 2007-06-13 Guy Morel

p-Value as the Strength of Evidence Measured by Confidence Distribution

The notion of p-value is a fundamental concept in statistical inference and has been widely used for reporting outcomes of hypothesis tests. However, p-value is often misinterpreted, misused or miscommunicated in practice. Part of the issue…

Methodology · Statistics 2020-02-03 Sifan Liu , Regina Liu , Min-ge Xie

Theoretical guarantees for change localization using conformal p-values

Changepoint localization aims to provide confidence sets for a changepoint (if one exists). Existing methods either relying on strong parametric assumptions or providing only asymptotic guarantees or focusing on a particular kind of…

Statistics Theory · Mathematics 2026-02-18 Swapnaneel Bhattacharyya , Aaditya Ramdas

Testing for Outliers with Conformal p-values

This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are…

Methodology · Statistics 2024-03-12 Stephen Bates , Emmanuel Candès , Lihua Lei , Yaniv Romano , Matteo Sesia

Shapley Value on Probabilistic Classifiers

Data valuation has become an increasingly significant discipline in data science due to the economic value of data. In the context of machine learning (ML), data valuation methods aim to equitably measure the contribution of each data point…

Machine Learning · Computer Science 2023-06-13 Xiang Li , Haocheng Xia , Jinfei Liu

Improved multivariate normal mean estimation with unknown covariance when p is greater than n

We consider the problem of estimating the mean vector of a p-variate normal $(\theta,\Sigma)$ distribution under invariant quadratic loss, $(\delta-\theta)'\Sigma^{-1}(\delta-\theta)$, when the covariance is unknown. We propose a new class…

Statistics Theory · Mathematics 2013-02-28 Didier Chételat , Martin T. Wells

Instance-Dependent PU Learning by Bayesian Optimal Relabeling

When learning from positive and unlabelled data, it is a strong assumption that the positive observations are randomly sampled from the distribution of $X$ conditional on $Y = 1$, where X stands for the feature and Y the label. Most…

Machine Learning · Computer Science 2020-03-04 Fengxiang He , Tongliang Liu , Geoffrey I Webb , Dacheng Tao

Selective inference is easier with p-values

Selective inference is a subfield of statistics that enables valid inference after selection of a data-dependent question. In this paper, we introduce selectively dominant p-values, a class of p-values that allow practitioners to easily…

Methodology · Statistics 2024-11-22 Anav Sood