Related papers: Continuous Testing: Unifying Tests and E-values
Multiple testing of a single hypothesis and testing multiple hypotheses are usually done in terms of p-values. In this paper we replace p-values with their natural competitor, e-values, which are closely related to betting, Bayes factors,…
A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With p-values, it is not clear how to use an extreme observation…
We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study…
We study how to combine p-values and e-values, and design multiple testing procedures where both p-values and e-values are available for every hypothesis. Our results provide a new perspective on multiple testing with data-driven weights:…
We develop e-values and e-processes testing the null hypothesis that a distribution over nonnegative integers is monotone, and that a distribution over integers is unimodal given a certain mode. Our e-processes lead to tests of power one…
Forecasting and forecast evaluation are inherently sequential tasks. Predictions are often issued on a regular basis, such as every hour, day, or month, and their quality is monitored continuously. However, the classical statistical tools…
Conformal prediction is a powerful framework for distribution-free uncertainty quantification. The standard approach to conformal prediction relies on comparing the ranks of prediction scores: under exchangeability, the rank of a future…
A recurring debate in the philosophy of statistics concerns what, exactly, should count as a measure of evidence for or against a given hypothesis. P-values, likelihood ratios, and Bayes factors all have their defenders. In this paper we…
Given the well-known and fundamental problems with hypothesis testing via classical (point-form) significance tests, there has been a general move to alternative approaches, often focused on the Bayesian t-test. We show that the Bayesian…
There is a useful counterpart of conformal prediction for e-values, called conformal e-prediction. Conformal prediction can serve as basis for testing the assumption of exchangeability, leading to conformal testing. Similarly, conformal…
I propose a general quantum hypothesis testing theory that enables one to test hypotheses about any aspect of a physical system, including its dynamics, based on a series of observations. For example, the hypotheses can be about the…
We introduce the E-measure: a measure-like generalization of the E-value to a class of hypotheses. Unlike classical measures, E-measures are closed under infimums instead of addition. They arise from a compatibility axiom with logical…
We introduce the notion of p*-values (p*-variables), which generalizes p-values (p-variables) in several senses. The new notion has four natural interpretations: operational, probabilistic, Bayesian, and frequentist. A main example of a…
The literature on hypothesis testing with data-dependent and post-hoc significance levels relies on a particular extension of the Type-I error to data-dependent levels. Existing arguments for this extension are heuristic, and primarily…
Uniformly most powerful tests are statistical hypothesis tests that provide the greatest power against a fixed null hypothesis among all tests of a given size. In this article, the notion of uniformly most powerful tests is extended to the…
We point out that the traditional notion of test statistic is too narrow, and we propose a natural generalization that is arguably maximal. The study is restricted to simple statistical hypotheses.
In this paper we use e-values in the context of multiple hypothesis testing assuming that the base tests produce independent, or sequential, e-values. Our simulation and empirical studies and theoretical considerations suggest that, under…
Hypothesis testing via e-variables can be framed as a sequential betting game, where a player each round picks an e-variable. A good player's strategy results in an effective statistical test that rejects the null hypothesis as soon as…
Compared to p-values, e-values provably guarantee safe, valid inference. If the goal is to test multiple hypotheses simultaneously, one can construct e-values for each individual test and then use the recently developed e-BH procedure to…
Probability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts a numerical score such that a correct forecast achieves a minimal…