Related papers: QuickMMCTest - Quick Multiple Monte Carlo Testing
Consider testing multiple hypotheses using tests that can only be evaluated by simulation, such as permutation tests or bootstrap tests. This article introduces MMCTest, a sequential algorithm which gives, with arbitrarily high probability,…
The steadily increasing size of scientific Monte Carlo simulations and the desire for robust, correct, and reproducible results necessitates rigorous testing procedures for scientific simulations in order to detect numerical problems and…
Monte Carlo (MC) permutation test is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large…
We are concerned with a situation in which we would like to test multiple hypotheses with tests whose p-values cannot be computed explicitly but can be approximated using Monte Carlo simulation. This scenario occurs widely in practice. We…
We consider a statistical test whose p-value can only be approximated using Monte Carlo simulations. We are interested in deciding whether the p-value for an observed data set lies above or below a given threshold such as 5%. We want to…
Multiple hypothesis tests are often carried out in practice using p-value estimates obtained with bootstrap or permutation tests since the analytical p-values underlying all hypotheses are usually unknown. This article considers the…
Conventional multiple hypothesis tests use step-up, step-down, or closed testing methods to control the overall error rates. We will discuss marrying these methods with adaptive multistage sampling rules and stopping rules to perform…
The Monte Carlo algorithm is increasingly utilized, with its central step involving computer-based random sampling from stochastic models. While both Markov Chain Monte Carlo (MCMC) and Reject Monte Carlo serve as sampling methods, the…
Hypothesis tests calibrated by (re)sampling methods (such as permutation, rank and bootstrap tests) are useful tools for statistical analysis, at the computational cost of requiring Monte-Carlo sampling for calibration. It is common and…
In a Monte-Carlo test, the observed dataset is fixed, and several resampled or permuted versions of the dataset are generated in order to test a null hypothesis that the original dataset is exchangeable with the resampled/permuted ones.…
When permutation methods are used in practice, often a limited number of random permutations are used to decrease the computational burden. However, most theoretical literature assumes that the whole permutation group is used, and methods…
Software packages usually report the results of statistical tests using p-values. Users often interpret these by comparing them to standard thresholds, e.g. 0.1%, 1% and 5%, which is sometimes reinforced by a star rating (***, **, *). We…
Monte Carlo and Quasi-Monte Carlo methods present a convenient approach for approximating the expected value of a random variable. Algorithms exist to adaptively sample the random variable until a user defined absolute error tolerance is…
We propose approaches for testing implementations of Markov Chain Monte Carlo methods as well as of general Monte Carlo methods. Based on statistical hypothesis tests, these approaches can be used in a unit testing framework to, for…
Monte Carlo simulations are based on the manipulation of random numbers to evaluate probable outcomes, with applicability in a variety of different fields. By assigning probabilities, which can be determined a priori, to various events, it…
Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current…
Testing between hypotheses, when independent sampling is possible, is a well developed subject. In this paper, we propose hypothesis tests that are applicable when the samples are obtained using Markov chain Monte Carlo. These tests are…
The bootstrap is a versatile inference method that has proven powerful in many statistical problems. However, when applied to modern large-scale models, it could face substantial computation demand from repeated data resampling and model…
Importance sampling is a common technique for Monte Carlo approximation, including Monte Carlo approximation of p-values. Here it is shown that a simple correction of the usual importance sampling p-values creates valid p-values, meaning…
Estimating failure probabilities of engineering systems is an important problem in many engineering fields. In this work we consider such problems where the failure probability is extremely small (e.g $\leq10^{-10}$). In this case, standard…