Related papers: Detecting p-hacking
A flourishing empirical literature investigates the prevalence of $p$-hacking based on the distribution of $p$-values across studies. Interpreting results in this literature requires a careful understanding of the power of methods for…
We present the expected values from p-value hacking as a choice of the minimum p-value among $m$ independents tests, which can be considerably lower than the "true" p-value, even with a single trial, owing to the extreme skewness of the…
We show that some forms of p-hacking cannot be detected by examining the histogram of t-statistics or their p-values. Even when p-hacking is detectable, standard tests may lack power. We propose a novel test that detects every form of…
Hypothesis testing results often rely on simple, yet important assumptions about the behaviour of the distribution of p-values under the null and the alternative. We examine tests for one dimensional parameters of interest that converge to…
P-hacking is prevalent in reality but absent from classical hypothesis testing theory. As a consequence, significant results are much more common than they are supposed to be when the null hypothesis is in fact true. In this paper, we build…
$P$-values that are derived from continuously distributed test statistics are typically uniformly distributed on $(0,1)$ under least favorable parameter configurations (LFCs) in the null hypothesis. Conservativeness of a $p$-value $P$…
The randomized $p$-value, (nonrandomized) mid-$p$-value and abstract randomized $p$-value have all been recommended for testing a null hypothesis whenever the test statistic has a discrete distribution. This paper provides a unifying…
P-hacking poses challenges to traditional hypothesis testing. In this paper, we propose a robust method for the one-sample significance test that can protect against p-hacking from sample manipulation. Precisely, assuming a sequential…
We are concerned with testing replicability hypotheses for many endpoints simultaneously. This constitutes a multiple test problem with composite null hypotheses. Traditional $p$-values, which are computed under least favourable parameter…
The notion of p-value is a fundamental concept in statistical inference and has been widely used for reporting outcomes of hypothesis tests. However, p-value is often misinterpreted, misused or miscommunicated in practice. Part of the issue…
Publication bias and p-hacking are two well-known phenomena that strongly affect the scientific literature and cause severe problems in meta-analyses. Due to these phenomena, the assumptions of meta-analyses are seriously violated and the…
We study a large-scale one-sided multiple testing problem in which test statistics follow normal distributions with unit variance, and the goal is to identify signals with positive mean effects. A conventional approach is to compute…
This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are…
Given samples from two non-negative random variables, we propose a family of tests for the null hypothesis that one random variable stochastically dominates the other at the second order. Test statistics are obtained as functionals of the…
Testing to see whether a given data set comes from some specified distribution is among the oldest types of problems in Statistics. Many such tests have been developed and their performance studied. The general result has been that while a…
Many commonly used test statistics are based on a norm measuring the evidence against the null hypothesis. To understand how the choice of a norm affects power properties of tests in high dimensions, we study the consistency sets of…
In traditional hypothesis testing one must pre-specify the significance level $\alpha$ to bound the `size' of the test: its probability to falsely reject the hypothesis. Indeed, a data-dependent selection of $\alpha$ would generally distort…
Graphical tests assess whether a function of interest departs from an envelope of functions generated under a simulated null distribution. This approach originated in spatial statistics, but has recently gained some popularity in functional…
Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently…
Attacks on the P-value are nothing new, but the recent attacks are increasingly more serious. They come from more mainstream sources, with widening targets such as a call to retire the significance testing altogether. While well meaning, I…