Related papers: Sample size effects in multivariate fitting of cor…

Sample Size and Bias Approximations For Continuous Exposures Measured with Error

Measurement error is a pervasive challenge across many disciplines, yet its impact on sample size determination and the accuracy and precision of estimators regarding the association between an exposure and an outcome remains understudied…

Methodology · Statistics 2025-05-27 Honghyok Kim

Uses of Sub-sample Estimates to Reduce Errors in Stochastic Optimization Models

Optimization software enables the solution of problems with millions of variables and associated parameters. These parameters are, however, often uncertain and represented with an analytical description of the parameter's distribution or…

Optimization and Control · Mathematics 2025-01-17 John R. Birge

On the average of inconsistent data

When data do not conform to the hypothesis of a known sampling-variance, the fitting of a constant to the set of measured values is a long debated problem. Given the data, the fitting would require to find which measurand value is most…

Data Analysis, Statistics and Probability · Physics 2011-09-27 Giovanni Mana , Maria Mirabela Predescu

Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance

Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of meta-learners for estimation of heterogeneous treatment effects under…

Econometrics · Economics 2022-02-01 Gabriel Okasa

Uniform inference in linear mixed models

We provide finite-sample distribution approximations, that are uniform in the parameter, for inference in linear mixed models. Focus is on variances and covariances of random effects in cases where existing theory fails because their…

Statistics Theory · Mathematics 2025-07-29 Karl Oskar Ekvall , Matteo Bottai

Error estimation and reduction with cross correlations

Besides the well-known effect of autocorrelations in time series of Monte Carlo simulation data resulting from the underlying Markov process, using the same data pool for computing various estimates entails additional cross correlations.…

Statistical Mechanics · Physics 2014-11-20 Martin Weigel , Wolfhard Janke

Size effects in statistical fracture

We review statistical theories and numerical methods employed to consider the sample size dependence of the failure strength distribution of disordered materials. We first overview the analytical predictions of extreme value statistics and…

Materials Science · Physics 2015-05-13 Mikko J. Alava , Phani K. V. V. Nukala , Stefano Zapperi

On Stochastic Error and Computational Efficiency of the Markov Chain Monte Carlo Method

In Markov Chain Monte Carlo (MCMC) simulations, the thermal equilibria quantities are estimated by ensemble average over a sample set containing a large number of correlated samples. These samples are selected in accordance with the…

Data Analysis, Statistics and Probability · Physics 2015-01-08 J. Li , P. Vignal , S. Sun , V. M. Calo

Anomalities in the Analysis of Calibrated Data

This study examines effects of calibration errors on model assumptions and data--analytic tools in direct calibration assays. These effects encompass induced dependencies, inflated variances, and heteroscedasticity among the calibrated…

Statistics Theory · Mathematics 2011-03-30 D. R. Jensen , D. E. Ramirez

An analysis of the cost of hyper-parameter selection via split-sample validation, with applications to penalized regression

In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice…

Machine Learning · Statistics 2019-04-01 Jean Feng , Noah Simon

Risk and resampling under model uncertainty

In statistical exercises where there are several candidate models, the traditional approach is to select one model using some data driven criterion and use that model for estimation, testing and other purposes, ignoring the variability of…

Statistics Theory · Mathematics 2008-12-18 Snigdhansu Chatterjee , Nitai D. Mukhopadhyay

An Empirical Bayes Jackknife Regression Framework for Covariance Matrix Estimation

Covariance matrix estimation, a classical statistical topic, poses significant challenges when the sample size is comparable to or smaller than the number of features. In this paper, we frame covariance matrix estimation as a compound…

Methodology · Statistics 2025-03-04 Huqin Xin , Sihai Dave Zhao

Effect of Correlations Between Model Parameters and Nuisance Parameters When Model Parameters are Fit to Data

The effect of correlations between model parameters and nuisance parameters is discussed, in the context of fitting model parameters to data. Modifications to the usual $\chi^2$ method are required. Fake data studies, as used at present,…

Data Analysis, Statistics and Probability · Physics 2013-09-25 Byron Roe

Exploring the Impact of Dataset Statistical Effect Size on Model Performance and Data Sample Size Sufficiency

Having a sufficient quantity of quality data is a critical enabler of training effective machine learning models. Being able to effectively determine the adequacy of a dataset prior to training and evaluating a model's performance would be…

Machine Learning · Computer Science 2026-04-28 Arya Hatamian , Lionel Levine , Haniyeh Ehsani Oskouie , Majid Sarrafzadeh

On approximating the distributions of goodness-of-fit test statistics based on the empirical distribution function: The case of unknown parameters

This paper discusses some problems possibly arising when approximating via Monte-Carlo simulations the distributions of goodness-of-fit test statistics based on the empirical distribution function. We argue that failing to re-estimate…

Data Analysis, Statistics and Probability · Physics 2008-04-01 Marco Capasso , Lucia Alessi , Matteo Barigozzi , Giorgio Fagiolo

Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight…

Machine Learning · Statistics 2024-11-04 Lucas Clarté , Adrien Vandenbroucque , Guillaume Dalle , Bruno Loureiro , Florent Krzakala , Lenka Zdeborová

Causal Inference for Experiments with Latent Outcomes: Key Results and Their Implications for Design and Analysis

How should researchers analyze randomized experiments in which the main outcome is latent and measured in multiple ways but each measure contains some degree of error? We first identify a critical study-specific noncomparability problem in…

Econometrics · Economics 2026-01-13 Jiawei Fu , Donald P. Green

Unsupervised Calibration under Covariate Shift

A probabilistic model is said to be calibrated if its predicted probabilities match the corresponding empirical frequencies. Calibration is important for uncertainty quantification and decision making in safety-critical applications. While…

Machine Learning · Computer Science 2020-07-01 Anusri Pampari , Stefano Ermon

Inadequacy of internal covariance estimation for super-sample covariance

We give an analytical interpretation of how subsample-based internal covariance estimators lead to biased estimates of the covariance, due to underestimating the super-sample covariance (SSC). This includes the jackknife and bootstrap…

Cosmology and Nongalactic Astrophysics · Physics 2018-04-16 Fabien Lacasa , Martin Kunz

Sampling errors of quantile estimations from finite samples of data

Empirical relationships are derived for the expected sampling error of quantile estimations using Monte Carlo experiments for two frequency distributions frequently encountered in climate sciences. The relationships found are expressed as a…

Methodology · Statistics 2016-10-12 Philippe Roy , René Laprise , Philippe Gachon