Related papers: Sample size effects in multivariate fitting of cor…
Measurement error is a pervasive challenge across many disciplines, yet its impact on sample size determination and the accuracy and precision of estimators regarding the association between an exposure and an outcome remains understudied…
Optimization software enables the solution of problems with millions of variables and associated parameters. These parameters are, however, often uncertain and represented with an analytical description of the parameter's distribution or…
When data do not conform to the hypothesis of a known sampling-variance, the fitting of a constant to the set of measured values is a long debated problem. Given the data, the fitting would require to find which measurand value is most…
Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of meta-learners for estimation of heterogeneous treatment effects under…
We provide finite-sample distribution approximations, that are uniform in the parameter, for inference in linear mixed models. Focus is on variances and covariances of random effects in cases where existing theory fails because their…
Besides the well-known effect of autocorrelations in time series of Monte Carlo simulation data resulting from the underlying Markov process, using the same data pool for computing various estimates entails additional cross correlations.…
We review statistical theories and numerical methods employed to consider the sample size dependence of the failure strength distribution of disordered materials. We first overview the analytical predictions of extreme value statistics and…
In Markov Chain Monte Carlo (MCMC) simulations, the thermal equilibria quantities are estimated by ensemble average over a sample set containing a large number of correlated samples. These samples are selected in accordance with the…
This study examines effects of calibration errors on model assumptions and data--analytic tools in direct calibration assays. These effects encompass induced dependencies, inflated variances, and heteroscedasticity among the calibrated…
In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice…
In statistical exercises where there are several candidate models, the traditional approach is to select one model using some data driven criterion and use that model for estimation, testing and other purposes, ignoring the variability of…
Covariance matrix estimation, a classical statistical topic, poses significant challenges when the sample size is comparable to or smaller than the number of features. In this paper, we frame covariance matrix estimation as a compound…
The effect of correlations between model parameters and nuisance parameters is discussed, in the context of fitting model parameters to data. Modifications to the usual $\chi^2$ method are required. Fake data studies, as used at present,…
Having a sufficient quantity of quality data is a critical enabler of training effective machine learning models. Being able to effectively determine the adequacy of a dataset prior to training and evaluating a model's performance would be…
This paper discusses some problems possibly arising when approximating via Monte-Carlo simulations the distributions of goodness-of-fit test statistics based on the empirical distribution function. We argue that failing to re-estimate…
We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight…
How should researchers analyze randomized experiments in which the main outcome is latent and measured in multiple ways but each measure contains some degree of error? We first identify a critical study-specific noncomparability problem in…
A probabilistic model is said to be calibrated if its predicted probabilities match the corresponding empirical frequencies. Calibration is important for uncertainty quantification and decision making in safety-critical applications. While…
We give an analytical interpretation of how subsample-based internal covariance estimators lead to biased estimates of the covariance, due to underestimating the super-sample covariance (SSC). This includes the jackknife and bootstrap…
Empirical relationships are derived for the expected sampling error of quantile estimations using Monte Carlo experiments for two frequency distributions frequently encountered in climate sciences. The relationships found are expressed as a…