Related papers: Model Selection Confidence Sets by Likelihood Rati…
This paper studies the Model Selection Confidence Set (MSCS) methodology for univariate time series models involving autoregressive and moving average components, and applies it to study model selection uncertainty in the Italian…
A fundamental challenge in approximating an unknown density using finite Gaussian mixture models is selecting the number of mixture components, also known as order. Traditional approaches choose a single best model using information…
Classical model selection seeks to find a single model within a particular class that optimizes some pre-specified criteria, such as maximizing a likelihood or minimizing a risk. More recently, there has been an increased interest in model…
In most prediction and estimation situations, scientists consider various statistical models for the same problem, and naturally want to select amongst the best. Hansen et al. (2011) provide a powerful solution to this problem by the…
Given a universe of N assets, investors often form equally weighted portfolios (EWPs) by selecting subsets of assets. EWPs are simple, robust, and competitive out-of-sample, yet the uncertainty about which subset truly performs best is…
In this article, we introduce the concept of model confidence bounds (MCB) for variable selection in the context of nested models. Similarly to the endpoints in the familiar confidence interval for parameter estimation, the MCB identifies…
In complicated/nonlinear parametric models, it is generally hard to know whether the model parameters are point identified. We provide computationally attractive procedures to construct confidence sets (CSs) for identified sets of full…
The growing prevalence of unauthorized model usage and misattribution has increased the need for reliable model provenance analysis. However, existing methods largely rely on heuristic fingerprint-matching rules that lack provable error…
Certifiable, adaptive uncertainty estimates for unknown quantities are an essential ingredient of sequential decision-making algorithms. Standard approaches rely on problem-dependent concentration results and are limited to a specific…
Selective classification allows models to abstain from making predictions (e.g., say "I don't know") when in doubt in order to obtain better effective accuracy. While typical selective models can be effective at producing more accurate…
High-dimensional tests are applied to find relevant sets of variables and relevant models. If variables are selected by analyzing the sums of products matrices and a corresponding mean-value test is performed, there is the danger that the…
For regression model selection via maximum likelihood estimation, we adopt a vector representation of candidate models and study the likelihood ratio confidence region for the regression parameter vector of a full model. We show that when…
Large language models (LLMs) are increasingly used to simulate human behavior, but common practices to use LLM-generated data are inefficient. Treating an LLM's output ("model choice") as a single data point underutilizes the information…
When solving real-world problems, practitioners often hesitate to implement solutions obtained from mathematical models, especially for important decisions. This hesitation stems from practitioners' lack of trust in optimization models and…
This paper proposes a Conditional Method Confidence Set (CMCS) which allows to select the best subset of forecasting methods with equal predictive ability conditional on a specific economic regime. The test resembles the Model Confidence…
Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…
Testing of deep learning models is challenging due to the excessive number and complexity of computations involved. As a result, test data selection is performed manually and in an ad hoc way. This raises the question of how we can…
Model selection is a central task in statistics, but standard methods are not robust in misspecified settings where the true data-generating process (DGP) is not in the set of candidate models. The key limitation is that existing methods --…
Automated methods for discovering mechanistic simulator models from observational data offer a promising path toward accelerating scientific progress. Such methods often take the form of agentic-style iterative workflows that repeatedly…
In the deployment of large language models (LLMs), accurate confidence estimation is critical for assessing the credibility of model predictions. However, existing methods often fail to overcome the issue of overconfidence on incorrect…