Related papers: Optimal Data Split Methodology for Model Validatio…
While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly…
Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large…
Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…
This paper considers the problem of model selection under domain shift. Motivated by principles from distributionally robust optimisation and domain adaptation theory, it is proposed that the training-validation split should maximise the…
Numerical predictions of quantities of interest measured within physical systems rely on the use of mathematical models that should be validated, or at best, not invalidated. Model validation usually involves the comparison of experimental…
As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all…
This paper is about how to partition decision variables while decomposing a large-scale optimization problem for the best performance of distributed solution methods. Solving a large-scale optimization problem sequen- tially can be…
We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating…
Validation is often defined as the process of determining the degree to which a model is an accurate representation of the real world from the perspective of its intended uses. Validation is crucial as industries and governments depend…
Partition-wise models offer a flexible approach for modeling complex and multidimensional data that are capable of producing interpretable results. They are based on partitioning the observed data into regions, each of which is modeled with…
A key question in evaluation of computer models is Does the computer model adequately represent reality? A six-step process for computer model validation is set out in Bayarri et al. [Technometrics 49 (2007) 138--154] (and briefly…
In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model…
In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal…
In this paper, we propose a catalog of iterative methods for solving the Split Feasibility Problem in the non-convex setting. We study four different optimization formulations of the problem, where each model has advantageous in different…
Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the…
Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…
When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…
Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their…
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of…
Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree…