Related papers: Selective Correlations - the conditional estimator…
In this paper we consider the problem of constructing confidence intervals for coefficients of martingale regression models (in particular, time series models) after variable selection. Although constructing confidence intervals are common…
Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible…
We propose a new optimization framework for aleatoric uncertainty estimation in regression problems. Existing methods can quantify the error in the target estimation, but they tend to underestimate it. To obtain the predictive uncertainty…
As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate…
We develop a general approach to valid inference after model selection. At the core of our framework is a result that characterizes the distribution of a post-selection estimator conditioned on the selection event. We specialize the…
We consider the problem of providing valid inference for a selected parameter in a sparse regression setting. It is well known that classical regression tools can be unreliable in this context due to the bias generated in the selection…
Post-selection inference consists in providing statistical guarantees, based on a data set, that are robust to a prior model selection step on the same data set. In this paper, we address an instance of the post-selection-inference problem,…
Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, which delivers valid answers to the set of all…
When using dyadic data (i.e., data indexed by pairs of units), researchers typically assume a linear model, estimate it using Ordinary Least Squares and conduct inference using ``dyadic-robust" variance estimators. The latter assumes that…
Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory which assumes a fixed set of covariates in the…
Prediction intervals are a machine- and human-interpretable way to represent predictive uncertainty in a regression analysis. In this paper, we present a method for generating prediction intervals along with point estimates from an ensemble…
Survey sampling is concerned with the estimation of finite population parameters. In practice, survey data suffer from item nonresponse, which is commonly handled through imputation, i.e., replacing missing values with predicted values. As…
A statistical estimation model with qualitative input provides a mechanism to fuse human intuition in the form of qualitative information into a statistical model. We investigate the statistical properties of this model and devise a…
In this article we present very intuitive, easy to follow, yet mathematically rigorous, approach to the so called data fitting process. Rather than minimizing the distance between measured and simulated data points, we prefer to find such…
Plasticity is one of the most important properties of the nervous system, which enables animals to adjust their behavior to the ever-changing external environment. Changes in synaptic efficacy between neurons constitute one of the major…
Selective inference (post-selection inference) is a methodology that has attracted much attention in recent years in the fields of statistics and machine learning. Naive inference based on data that are also used for model selection tends…
Spurious correlations occur when a model learns unreliable features from the data and are a well-known drawback of data-driven learning. Although there are several algorithms proposed to mitigate it, we are yet to jointly derive the…
Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for…
This paper addresses the problem of selective classification for deep neural networks, where a model is allowed to abstain from low-confidence predictions to avoid potential errors. We focus on so-called post-hoc methods, which replace the…
Panels with large time $(T)$ and cross-sectional $(N)$ dimensions are a key data structure in social sciences and other fields. A central question in panel data analysis is whether to pool data across individuals or to estimate separate…