Related papers: Data Consistency Approach to Model Validation

A Goodness-of-Fit Test for Statistical Models

Statistical modeling plays a fundamental role in understanding the underlying mechanism of massive data (statistical inference) and predicting the future (statistical prediction). Although all models are wrong, researchers try their best to…

Methodology · Statistics 2020-06-17 Hangjin Jiang

Calibrated inference: statistical inference that accounts for both sampling uncertainty and distributional uncertainty

How can we draw trustworthy scientific conclusions? One criterion is that a study can be replicated by independent teams. While replication is critically important, it is arguably insufficient. If a study is biased for some reason and other…

Methodology · Statistics 2025-02-06 Yujin Jeong , Dominik Rothenhäusler

A Set of Rules for Model Validation

The validation of a data-driven model is the process of assessing the model's ability to generalize to new, unseen data in the population of interest. This paper proposes a set of general rules for model validation. These rules are designed…

Methodology · Statistics 2026-01-30 José Camacho

How consistent is my model with the data? Information-Theoretic Model Check

The choice of model class is fundamental in statistical learning and system identification, no matter whether the class is derived from physical principles or is a generic black-box. We develop a method to evaluate the specified model class…

Machine Learning · Statistics 2017-12-20 Andreas Svensson , Dave Zachariah , Thomas B. Schön

Data-driven goodness-of-fit tests

We propose and study a general method for construction of consistent statistical tests on the basis of possibly indirect, corrupted, or partially available observations. The class of tests devised in the paper contains Neyman's smooth…

Statistics Theory · Mathematics 2017-09-22 Mikhail Langovoy

Coherent Frameworks for Statistical Inference serving Integrating Decision Support Systems

A subjective expected utility policy making centre, managing complex, dynamic systems, needs to draw on the expertise of a variety of disparate panels of experts and integrate this information coherently. To achieve this, diverse supporting…

Methodology · Statistics 2015-12-21 Jim Q. Smith , Martine J. Barons , Manuele Leonelli

Anytime-Valid Linear Models and Regression Adjusted Causal Inference in Randomized Experiments

Linear models are foundational tools in statistics and ubiquitous across the applied sciences. However, conventional statistical inference -- such as $t$-tests and $F$-tests -- are only valid at fixed sample sizes, making them unsuitable…

Methodology · Statistics 2025-07-08 Michael Lindon , Dae Woong Ham , Martin Tingley , Iavor Bojinov

An updated look on the convergence and consistency of data-driven dynamical models

Deep sequence models are receiving significant interest in current machine learning research. By representing probability distributions that are fit to data using maximum likelihood estimation, such models can model data on general…

Systems and Control · Electrical Eng. & Systems 2024-09-09 Kristian Løvland , Bjarne Grimstad , Lars Struen Imsland

Consistency Tests for Comparing Astrophysical Models and Observations

In astronomy, there is an opportunity to enhance the practice of validating models through statistical techniques, specifically to account for measurement error uncertainties. While models are commonly used to describe observations, there…

Instrumentation and Methods for Astrophysics · Physics 2023-08-09 Fiorenzo Stoppa , Eric Cator , Gijs Nelemans

Valid inferential models for prediction in supervised learning problems

Prediction, where observed data is used to quantify uncertainty about a future observation, is a fundamental problem in statistics. Prediction sets with coverage probability guarantees are a common solution, but these do not provide…

Statistics Theory · Mathematics 2022-11-22 Leonardo Cella , Ryan Martin

One Step to Efficient Synthetic Data

A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true…

Statistics Theory · Mathematics 2026-02-18 Jordan Awan , Zhanrui Cai

Preserving Statistical Validity in Adaptive Data Analysis

A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple…

Machine Learning · Computer Science 2016-03-03 Cynthia Dwork , Vitaly Feldman , Moritz Hardt , Toniann Pitassi , Omer Reingold , Aaron Roth

Knowledge-based model validation using a custom metric

Vehicle models have a long history of research and as of today are able to model the involved physics in a reasonable manner. However, each new vehicle has its new characteristics or parameters. The identification of these is the main task…

Computational Engineering, Finance, and Science · Computer Science 2024-12-11 Nicola Henkelmann , Stephan Rhode , Johannes von Keler

Stable but Wrong: When More Data Degrades Scientific Conclusions

Modern science increasingly relies on ever-growing observational datasets and automated inference pipelines, under the implicit belief that accumulating more data makes scientific conclusions more reliable. Here we show that this belief can…

Machine Learning · Computer Science 2026-02-06 Zhipeng Zhang , Kai Li

Robustness Metric for Quantifying Causal Model Confidence and Parameter Uncertainty

Many methods of estimating causal models do not provide estimates of confidence in the resulting model. In this work, a metric is proposed for validating the output of a causal model fit; the robustness of the model structure with resampled…

Methodology · Statistics 2016-02-09 Garrett Waycaster , Christian Bes , Volodymyr Bilotkach , Christian Gogu , Raphael Haftka , Nam-Ho Kim

Strong Faithfulness and Uniform Consistency in Causal Inference

A fundamental question in causal inference is whether it is possible to reliably infer manipulation effects from observational data. There are a variety of senses of asymptotic reliability in the statistical literature, among which the most…

Artificial Intelligence · Computer Science 2012-12-12 Jiji Zhang , Peter L. Spirtes

Formal and Informal Model Selection with Incomplete Data

Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of…

Methodology · Statistics 2008-08-28 Geert Verbeke , Geert Molenberghs , Caroline Beunckens

Evaluating the Success of a Data Analysis

A fundamental problem in the practice and teaching of data science is how to evaluate the quality of a given data analysis, which is different than the evaluation of the science or question underlying the data analysis. Previously, we…

Other Statistics · Statistics 2019-04-29 Stephanie C. Hicks , Roger D. Peng

Model selection in the average of inconsistent data: an analysis of the measured Planck-constant values

When the data do not conform to the hypothesis of a known sampling-variance, the fitting of a constant to a set of measured values is a long debated problem. Given the data, fitting would require to find what measurand value is the most…

Data Analysis, Statistics and Probability · Physics 2020-07-21 Giovanni Mana , Enrico Massa , Maria Predescu

Causal Discovery of Linear Cyclic Models from Multiple Experimental Data Sets with Overlapping Variables

Much of scientific data is collected as randomized experiments intervening on some and observing other variables of interest. Quite often, a given phenomenon is investigated in several studies, and different sets of variables are involved…

Methodology · Statistics 2012-10-19 Antti Hyttinen , Frederick Eberhardt , Patrik O. Hoyer