Related papers: Beyond Neyman-Pearson: e-values enable hypothesis …
In traditional hypothesis testing one must pre-specify the significance level $\alpha$ to bound the `size' of the test: its probability to falsely reject the hypothesis. Indeed, a data-dependent selection of $\alpha$ would generally distort…
The validity of classical hypothesis testing requires the significance level $\alpha$ be fixed before any statistical analysis takes place. This is a stringent requirement. For instance, it prohibits updating $\alpha$ during (or after) an…
We discuss systematically two versions of confidence regions: those based on p-values and those based on e-values, a recent alternative to p-values. Both versions can be applied to multiple hypothesis testing, and in this paper we are…
The literature on hypothesis testing with data-dependent and post-hoc significance levels relies on a particular extension of the Type-I error to data-dependent levels. Existing arguments for this extension are heuristic, and primarily…
The e-value is swiftly rising in prominence in many applications of hypothesis testing and multiple testing, yet its relationship to classical testing theory remains elusive. We unify e-values and classical testing into a single 'continuous…
A recurring debate in the philosophy of statistics concerns what, exactly, should count as a measure of evidence for or against a given hypothesis. P-values, likelihood ratios, and Bayes factors all have their defenders. In this paper we…
We derive inferential procedures for large sample sizes that remain valid under data-dependent significance levels (so-called "post-hoc valid inference"). Classical statistical tools require that the significance level -- the "type-I error"…
Bayesian hypothesis testing via Bayes factors offers a principled alternative to classical p-value methods in meta-analysis, particularly suited to its cumulative and sequential nature. Unlike commonly reported p-values for standard null…
We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study…
A fundamental assumption of classical hypothesis testing is that the significance threshold $\alpha$ is chosen independently from the data. The validity of confidence intervals likewise relies on choosing $\alpha$ beforehand. We point out…
Conformal prediction is a powerful framework for distribution-free uncertainty quantification. The standard approach to conformal prediction relies on comparing the ranks of prediction scores: under exchangeability, the rank of a future…
Multiple testing of a single hypothesis and testing multiple hypotheses are usually done in terms of p-values. In this paper we replace p-values with their natural competitor, e-values, which are closely related to betting, Bayes factors,…
Quality statistical inference requires a sufficient amount of data, which can be missing or hard to obtain. To this end, prediction-powered inference has risen as a promising methodology, but existing approaches are largely limited to…
We study how to combine p-values and e-values, and design multiple testing procedures where both p-values and e-values are available for every hypothesis. Our results provide a new perspective on multiple testing with data-driven weights:…
This article gives a survey of the e-value, a statistical significance measure a.k.a. the evidence rendered by observational data, X, in support of a statistical hypothesis, H, or, the other way around, the epistemic value of H given X. The…
The notion of p-value is a fundamental concept in statistical inference and has been widely used for reporting outcomes of hypothesis tests. However, p-value is often misinterpreted, misused or miscommunicated in practice. Part of the issue…
Probability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts a numerical score such that a correct forecast achieves a minimal…
The rejection threshold used for e-values and e-processes is by default set to $1/\alpha$ for a guaranteed type-I error control at $\alpha$, based on Markov's and Ville's inequalities. This threshold can be wasteful in practical…
The American Statistical Association (ASA) statement on statistical significance and P-values \cite{wasserstein2016asa} cautioned statisticians against making scientific decisions solely on the basis of traditional P-values. The statement…
P-values are a mainstay in statistics but are often misinterpreted. We propose a new interpretation of p-value as a meaningful plausibility, where this is to be interpreted formally within the inferential model framework. We show that, for…