Related papers: Optimal Data Split Methodology for Model Validatio…

Network cross-validation by edge sampling

While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly…

Methodology · Statistics 2020-05-04 Tianxi Li , Elizaveta Levina , Ji Zhu

An Algorithm for Optimal Partitioning of Data on an Interval

Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large…

Numerical Analysis · Mathematics 2025-10-20 Brad Jackson , Jeffrey D. Scargle , David Barnes , Sundararajan Arabhi , Alina Alt , Peter Gioumousis , Elyus Gwin , Paungkaew Sangtrakulcharoen , Linda Tan , Tun Tao Tsai

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Clustering-Based Validation Splits for Model Selection under Domain Shift

This paper considers the problem of model selection under domain shift. Motivated by principles from distributionally robust optimisation and domain adaptation theory, it is proposed that the training-validation split should maximise the…

Machine Learning · Computer Science 2025-08-19 Andrea Napoli , Paul White

Optimal Design of Validation Experiments for the Prediction of Quantities of Interest

Numerical predictions of quantities of interest measured within physical systems rely on the use of mathematical models that should be validated, or at best, not invalidated. Model validation usually involves the comparison of experimental…

Computational Engineering, Finance, and Science · Computer Science 2023-07-19 Antonin Paquette-Rufiange , Serge Prudhomme , Marc Laforest

Automated Data Slicing for Model Validation:A Big data - AI Integration Approach

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all…

Databases · Computer Science 2019-01-08 Yeounoh Chung , Tim Kraska , Neoklis Polyzotis , Ki Hyun Tae , Steven Euijong Whang

Variable Partitioning for Distributed Optimization

This paper is about how to partition decision variables while decomposing a large-scale optimization problem for the best performance of distributed solution methods. Solving a large-scale optimization problem sequen- tially can be…

Optimization and Control · Mathematics 2017-10-26 Yuchen Zheng , Ilbin Lee , Nicoleta Serban

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating…

Machine Learning · Computer Science 2022-10-03 Anastasios N. Angelopoulos , Stephen Bates , Emmanuel J. Candès , Michael I. Jordan , Lihua Lei

Algorithm for Model Validation: Theory and Applications

Validation is often defined as the process of determining the degree to which a model is an accurate representation of the real world from the perspective of its intended uses. Validation is crucial as industries and governments depend…

Data Analysis, Statistics and Probability · Physics 2015-06-26 D. Sornette , A. B. Davis , K. Ide , K. R. Vixie , V. Pisarenko , J. R. Kamm

Consistent Estimation for Partition-wise Regression and Classification Models

Partition-wise models offer a flexible approach for modeling complex and multidimensional data that are capable of producing interpretable results. They are based on partitioning the observed data into regions, each of which is modeled with…

Methodology · Statistics 2017-06-07 Rex C. Y. Cheung , Alexander Aue , Thomas C. M. Lee

Computer model validation with functional output

A key question in evaluation of computer models is Does the computer model adequately represent reality? A six-step process for computer model validation is set out in Bayarri et al. [Technometrics 49 (2007) 138--154] (and briefly…

Methodology · Statistics 2009-09-29 M. J. Bayarri , J. O. Berger , J. Cafeo , G. Garcia-Donato , F. Liu , J. Palomo , R. J. Parthasarathy , R. Paulo , J. Sacks , D. Walsh

Testing Calibration in Nearly-Linear Time

In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model…

Machine Learning · Computer Science 2024-06-24 Lunjia Hu , Arun Jambulapati , Kevin Tian , Chutong Yang

SPlit: An Optimal Method for Data Splitting

In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal…

Machine Learning · Statistics 2021-05-10 V. Roshan Joseph , Akhil Vakayil

Non-Convex Split Feasibility Problems: Models, Algorithms and Theory

In this paper, we propose a catalog of iterative methods for solving the Split Feasibility Problem in the non-convex setting. We study four different optimization formulations of the problem, where each model has advantageous in different…

Optimization and Control · Mathematics 2020-10-12 Aviv Gibali , Shoham Sabach , Sergey Voldman

Does Data Splitting Improve Prediction?

Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the…

Methodology · Statistics 2016-01-20 Julian J. Faraway

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

Nested cross-validation when selecting classifiers is overzealous for most practical applications

When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…

Machine Learning · Computer Science 2018-09-26 Jacques Wainer , Gavin Cawley

A New Flexible Train-Test Split Algorithm, an approach for choosing among the Hold-out, K-fold cross-validation, and Hold-out iteration

Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their…

Machine Learning · Computer Science 2026-01-05 Zahra Bami , Ali Behnampour , Aniruddha Bora , Hassan Doosti

A survey of cross-validation procedures for model selection

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…

Statistics Theory · Mathematics 2011-02-01 Sylvain Arlot , Alain Celisse

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree…

Computer Vision and Pattern Recognition · Computer Science 2018-02-02 Marc Bosch , Christopher M. Gifford , Austin G. Dress , Clare W. Lau , Jeffrey G. Skibo , Gordon A. Christie