Related papers: Conditional predictive inference for stable algori…
Neural networks are among the most powerful nonlinear models used to address supervised learning problems. Similar to most machine learning algorithms, neural networks produce point predictions and do not provide any prediction interval…
This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact…
The field of distribution-free predictive inference provides tools for provably valid prediction without any assumptions on the distribution of the data, which can be paired with any regression algorithm to provide accurate and reliable…
We give a finite-sample analysis of predictive inference procedures after model selection in regression with random design. The analysis is focused on a statistically challenging scenario where the number of potentially important…
In a supervised learning problem, given a predicted value that is the output of some trained model, how can we quantify our uncertainty around this prediction? Distribution-free predictive inference aims to construct prediction intervals…
Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit…
This paper investigates the efficiency of the K-fold cross-validation (CV) procedure and a debiased version thereof as a means of estimating the generalization risk of a learning algorithm. We work under the general assumption of uniform…
Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…
In traditional k-fold cross-validation, each instance is used ($k-1$) times for training and once for testing, leading to redundancy that lets many instances disproportionately influence the learning phase. We introduce Irredundant $k$-fold…
Prediction sets based on full conformal prediction have seen an increasing interest in statistical learning due to their universal marginal coverage guarantees. However, practitioners have refrained from using it in applications for two…
We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems…
We consider the problem of distribution-free predictive inference, with the goal of producing predictive coverage guarantees that hold conditionally rather than marginally. Existing methods such as conformal prediction offer marginal…
It is crucial to assess the predictive performance of a model to establish its practicality and relevance in real-world scenarios, particularly for high-dimensional data analysis. Among data splitting or resampling methods, cross-validation…
Conformal predictors are machine learning algorithms that output prediction sets that have a guarantee of marginal validity for finite samples with minimal distributional assumptions. This is a property that makes conformal predictors…
Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds…
While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus…
Conformal Prediction methods have finite-sample distribution-free marginal coverage guarantees. However, they generally do not offer conditional coverage guarantees, which can be important for high-stakes decisions. In this paper, we…
The purpose of this paper is to propose methodologies for statistical inference of low-dimensional parameters with high-dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of…
Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…
We revisit the problem of constructing predictive confidence sets for which we wish to obtain some type of conditional validity. We provide new arguments showing how ``split conformal'' methods achieve near desired coverage levels with high…