Related papers: Significance testing without truth

Significance Tests in Climate Science

A large fraction of papers in the climate literature includes erroneous uses of significance tests. A Bayesian analysis is presented to highlight the meaning of significance tests and why typical misuse occurs. It is concluded that a…

Atmospheric and Oceanic Physics · Physics 2016-08-24 Maarten H. P. Ambaum

A Statistical Significance Simulation Study for the General Scientist

When a scientist performs an experiment they normally acquire a set of measurements and are expected to demonstrate that their results are "statistically significant" thus confirming whatever hypothesis they are testing. The main method for…

Other Statistics · Statistics 2011-09-30 Jacob Levman

On Statistical Non-Significance

Significance tests are probably the most extended form of inference in empirical research, and significance is often interpreted as providing greater informational content than non-significance. In this article we show, however, that…

Other Statistics · Statistics 2018-03-05 Alberto Abadie

I can see clearly now: reinterpreting statistical significance

Null hypothesis significance testing remains popular despite decades of concern about misuse and misinterpretation. We believe that much of the problem is due to language: significance testing has little to do with other meanings of the…

Other Statistics · Statistics 2018-10-16 Jonathan Dushoff , Morgan P. Kain , Benjamin M. Bolker

Faithful Model Evaluation for Model-Based Metrics

Statistical significance testing is used in natural language processing (NLP) to determine whether the results of a study or experiment are likely to be due to chance or if they reflect a genuine relationship. A key step in significance…

Computation and Language · Computer Science 2024-01-01 Palash Goyal , Qian Hu , Rahul Gupta

Statistical significance revisited

Statistical significance measures the reliability of a result obtained from a random experiment. We investigate the number of repetitions needed for a statistical result to have a certain significance. In the first step, we consider…

Methodology · Statistics 2024-06-19 Maike Tormählen , Galiya Klinkova , Michael Grabinski

When More Is Less: Pitfalls of significance testing

The controversy about statistical significance vs. scientific relevance is more than 100 years old. But still nowadays null hypothesis significance testing is considered as gold standard in many empirical fields from economics and social…

Applications · Statistics 2022-11-23 Uwe Hassler

More accurate tests for the statistical significance of result differences

Statistical significance testing of differences in values of metrics like recall, precision and balanced F-score is a necessary part of empirical natural language processing. Unfortunately, we find in a set of experiments that many commonly…

Computation and Language · Computer Science 2007-05-23 Alexander Yeh

A Goodness-of-Fit Test for Statistical Models

Statistical modeling plays a fundamental role in understanding the underlying mechanism of massive data (statistical inference) and predicting the future (statistical prediction). Although all models are wrong, researchers try their best to…

Methodology · Statistics 2020-06-17 Hangjin Jiang

Caveats for using statistical significance tests in research assessments

This paper raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators. Statistical…

Digital Libraries · Computer Science 2012-09-26 Jesper W. Schneider

How to Tell When a Result Will Replicate: Significance and Replication in Distributional Null Hypothesis Tests

There is a well-known problem in Null Hypothesis Significance Testing: many statistically significant results fail to replicate in subsequent experiments. We show that this problem arises because standard `point-form null' significance…

Methodology · Statistics 2025-02-06 Fintan Costello , Paul Watts

Policy Implications of Statistical Estimates: A General Bayesian Decision-Theoretic Model for Binary Outcomes

How should we evaluate the effect of a policy on the likelihood of an undesirable event, such as conflict? The significance test has three limitations. First, relying on statistical significance misses the fact that uncertainty is a…

Methodology · Statistics 2022-05-03 Akisato Suzuki

When Evidence and Significance Collide

Null hypothesis statistical significance testing (NHST) is the dominant approach for evaluating results from randomized controlled trials. Whereas NHST comes with long-run error rate guarantees, its main inferential tool -- the $p$-value --…

Methodology · Statistics 2022-06-10 František Bartoš , Samuel Pawel , Eric-Jan Wagenmakers

The posterior probability of a null hypothesis given a statistically significant result

When researchers carry out a null hypothesis significance test, it is tempting to assume that a statistically significant result lowers Prob(H0), the probability of the null hypothesis being true. Technically, such a statement is…

Applications · Statistics 2022-04-19 Daniel J. Schad , Shravan Vasishth

Frequentist statistics as a theory of inductive inference

After some general remarks about the interrelation between philosophical and statistical thinking, the discussion centres largely on significance tests. These are defined as the calculation of $p$-values rather than as formal procedures for…

Statistics Theory · Mathematics 2007-06-13 Deborah G. Mayo , D. R. Cox

Simple Methods for Estimating Confidence Levels, or Tentative Probabilities, for Hypotheses Instead of P Values

In many fields of research null hypothesis significance tests and p values are the accepted way of assessing the degree of certainty with which research results can be extrapolated beyond the sample studied. However, there are very serious…

Methodology · Statistics 2020-01-14 Michael Wood

Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors

Statistical significance testing is widely accepted as a means to assess how well a difference in effectiveness reflects an actual difference between systems, as opposed to random noise because of the selection of topics. According to…

Information Retrieval · Computer Science 2019-06-07 Julián Urbano , Harlley Lima , Alan Hanjalic

deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks

A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline…

Machine Learning · Computer Science 2022-04-15 Dennis Ulmer , Christian Hardmeier , Jes Frellsen

Null hypothesis significance tests: A mix-up of two different theories, the basis for widespread confusion and numerous misinterpretations

Null hypothesis statistical significance tests (NHST) are widely used in quantitative research in the empirical sciences including scientometrics. Nevertheless, since their introduction nearly a century ago significance tests have been…

Other Statistics · Statistics 2014-02-06 Jesper W. Schneider

Using the rejection sampling for finding tests

A new method based on the rejection sampling for finding statistical tests is proposed. This method is conceptually intuitive, easy to implement, and applicable for arbitrary dimension. To illustrate its potential applicability, three…

Methodology · Statistics 2026-03-11 Markku Kuismin