Related papers: Model selection by resampling penalization
We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local…
We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is a function of the dimension of the model, at least…
We consider the estimation of a regression function with random design and heteroscedastic noise in a nonparametric setting. More precisely, we address the problem of characterizing the optimal penalty when the regression function is…
Bootstrap techniques (also called resampling computation techniques) have introduced new advances in modeling and model evaluation. Using resampling methods to construct a series of new samples which are based on the original data set,…
We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression…
We consider penalized extremum estimation of a high-dimensional, possibly nonlinear model that is sparse in the sense that most of its parameters are zero but some are not. We use the SCAD penalty function, which provides model selection…
We investigate the optimality for model selection of the so-called slope heuristics, $V$-fold cross-validation and $V$-fold penalization in a heteroscedastic with random design regression context. We consider a new class of linear models…
Penalized regression has become a standard tool for model building across a wide range of application domains. Common practice is to tune the amount of penalization to tradeoff bias and variance or to optimize some other measure of…
In a regression model, prediction is typically performed after model selection. The large variability in the model selection makes the prediction unstable. Thus, it is essential to reduce the variability in model selection and improve…
We consider a heteroscedastic regression model in which some of the regression coefficients are zero but it is not known which ones. Penalized quantile regression is a useful approach for analyzing such data. By allowing different…
Regularized regression approaches such as the Lasso have been widely adopted for constructing sparse linear models in high-dimensional datasets. A complexity in fitting these models is the tuning of the parameters which control the level of…
This article introduces lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS. The methods are suitable for…
In statistical exercises where there are several candidate models, the traditional approach is to select one model using some data driven criterion and use that model for estimation, testing and other purposes, ignoring the variability of…
In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice…
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel…
We build penalized least-squares estimators using the slope heuristic and resampling penalties. We prove oracle inequalities for the selected estimator with leading constant asymptotically equal to 1. We compare the practical performances…
One possible approach to tackle the class imbalance in classification tasks is to resample a training dataset, i.e., to drop some of its elements or to synthesize new ones. There exist several widely-used resampling methods. Recent research…
Transfer learning refers to the promising idea of initializing model fits based on pre-training on other data. We particularly consider regression modeling settings where parameter estimates from previous data can be used as anchoring…
We propose to address the common problem of linear estimation in linear statistical models by using a model selection approach via penalization. Depending then on the framework in which the linear statistical model is considered namely the…
This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning methods. We…