Related papers: P-values for classification
We propose two classes of nonparametric point estimators of $\theta=P(X<Y)$ in the case where $(X,Y)$ are paired, possibly dependent, absolutely continuous random variables. The proposed estimators are based on nonparametric estimators of…
Deciding whether a model provides a good description of data is often based on a goodness-of-fit criterion summarized by a p-value. Although there is considerable confusion concerning the meaning of p-values, leading to their misuse, they…
There are two distinct definitions of 'P-value' for evaluating a proposed hypothesis or model for the process generating an observed dataset. The original definition starts with a measure of the divergence of the dataset from what was…
$P$-values have been the focus of considerable criticism based on various considerations. Still, the $P$-value represents one of the most commonly used statistical tools. When assessing the suitability of a single hypothesized distribution,…
Quantifying uncertainty in detected changepoints is an important problem. However it is challenging as the naive approach would use the data twice, first to detect the changes, and then to test them. This will bias the test, and can lead to…
We consider the problem of evaluating black-box multi-class classifiers. In the standard setup, we observe class labels $Y\in \{0,1,\ldots,M-1\}$ generated according to the conditional distribution $ Y|X \sim \text{…
We study the problem nonparametric classification with repeated observations. Let $\bX$ be the $d$ dimensional feature vector and let $Y$ denote the label taking values in $\{1,\dots ,M\}$. In contrast to usual setup with large sample size…
We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all…
In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification. The positive and negative predictive values (ppv and npv, respectively) are conditional probabilities of the true…
We consider the structural change in a class of discrete valued time series that the conditional distribution follows a one-parameter exponential family. We propose a change-point test based on the maximum likelihood estimator of the…
Researchers faced with a sequence of candidate model specifications must often choose the best specification that does not violate a testable identification assumption. One option in this scenario is sequential specification tests:…
Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their…
The p-values are often implicitly used as a measure of evidence for the hypotheses of the tests. This practice has been analyzed with different approaches. It is generally accepted for the one-sided hypothesis problem, but it is often…
The notion of p-value is a fundamental concept in statistical inference and has been widely used for reporting outcomes of hypothesis tests. However, p-value is often misinterpreted, misused or miscommunicated in practice. Part of the issue…
Changepoint localization aims to provide confidence sets for a changepoint (if one exists). Existing methods either relying on strong parametric assumptions or providing only asymptotic guarantees or focusing on a particular kind of…
This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are…
Data valuation has become an increasingly significant discipline in data science due to the economic value of data. In the context of machine learning (ML), data valuation methods aim to equitably measure the contribution of each data point…
We consider the problem of estimating the mean vector of a p-variate normal $(\theta,\Sigma)$ distribution under invariant quadratic loss, $(\delta-\theta)'\Sigma^{-1}(\delta-\theta)$, when the covariance is unknown. We propose a new class…
When learning from positive and unlabelled data, it is a strong assumption that the positive observations are randomly sampled from the distribution of $X$ conditional on $Y = 1$, where X stands for the feature and Y the label. Most…
Selective inference is a subfield of statistics that enables valid inference after selection of a data-dependent question. In this paper, we introduce selectively dominant p-values, a class of p-values that allow practitioners to easily…