English
Related papers

Related papers: Optimal Data Split Methodology for Model Validatio…

200 papers

While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly…

Methodology · Statistics 2020-05-04 Tianxi Li , Elizaveta Levina , Ji Zhu

Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large…

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

This paper considers the problem of model selection under domain shift. Motivated by principles from distributionally robust optimisation and domain adaptation theory, it is proposed that the training-validation split should maximise the…

Machine Learning · Computer Science 2025-08-19 Andrea Napoli , Paul White

Numerical predictions of quantities of interest measured within physical systems rely on the use of mathematical models that should be validated, or at best, not invalidated. Model validation usually involves the comparison of experimental…

Computational Engineering, Finance, and Science · Computer Science 2023-07-19 Antonin Paquette-Rufiange , Serge Prudhomme , Marc Laforest

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all…

Databases · Computer Science 2019-01-08 Yeounoh Chung , Tim Kraska , Neoklis Polyzotis , Ki Hyun Tae , Steven Euijong Whang

This paper is about how to partition decision variables while decomposing a large-scale optimization problem for the best performance of distributed solution methods. Solving a large-scale optimization problem sequen- tially can be…

Optimization and Control · Mathematics 2017-10-26 Yuchen Zheng , Ilbin Lee , Nicoleta Serban

We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating…

Machine Learning · Computer Science 2022-10-03 Anastasios N. Angelopoulos , Stephen Bates , Emmanuel J. Candès , Michael I. Jordan , Lihua Lei

Validation is often defined as the process of determining the degree to which a model is an accurate representation of the real world from the perspective of its intended uses. Validation is crucial as industries and governments depend…

Data Analysis, Statistics and Probability · Physics 2015-06-26 D. Sornette , A. B. Davis , K. Ide , K. R. Vixie , V. Pisarenko , J. R. Kamm

Partition-wise models offer a flexible approach for modeling complex and multidimensional data that are capable of producing interpretable results. They are based on partitioning the observed data into regions, each of which is modeled with…

Methodology · Statistics 2017-06-07 Rex C. Y. Cheung , Alexander Aue , Thomas C. M. Lee

A key question in evaluation of computer models is Does the computer model adequately represent reality? A six-step process for computer model validation is set out in Bayarri et al. [Technometrics 49 (2007) 138--154] (and briefly…

In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model…

Machine Learning · Computer Science 2024-06-24 Lunjia Hu , Arun Jambulapati , Kevin Tian , Chutong Yang

In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal…

Machine Learning · Statistics 2021-05-10 V. Roshan Joseph , Akhil Vakayil

In this paper, we propose a catalog of iterative methods for solving the Split Feasibility Problem in the non-convex setting. We study four different optimization formulations of the problem, where each model has advantageous in different…

Optimization and Control · Mathematics 2020-10-12 Aviv Gibali , Shoham Sabach , Sergey Voldman

Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the…

Methodology · Statistics 2016-01-20 Julian J. Faraway

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…

Methodology · Statistics 2024-03-12 Stephen Bates , Trevor Hastie , Robert Tibshirani

When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…

Machine Learning · Computer Science 2018-09-26 Jacques Wainer , Gavin Cawley

Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their…

Machine Learning · Computer Science 2026-01-05 Zahra Bami , Ali Behnampour , Aniruddha Bora , Hassan Doosti

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…

Statistics Theory · Mathematics 2011-02-01 Sylvain Arlot , Alain Celisse

Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree…

Computer Vision and Pattern Recognition · Computer Science 2018-02-02 Marc Bosch , Christopher M. Gifford , Austin G. Dress , Clare W. Lau , Jeffrey G. Skibo , Gordon A. Christie
‹ Prev 1 2 3 10 Next ›