Related papers: Model selection by resampling penalization
In this paper, a new family of resampling-based penalization procedures for model selection is defined in a general framework. It generalizes several methods, including Efron's bootstrap penalization and the leave-one-out penalization…
We investigate the optimality for model selection of the so-called slope heuristics, $V$-fold cross-validation and $V$-fold penalization in a heteroscedastic with random design regression context. We consider a new class of linear models…
We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is a function of the dimension of the model, at least…
We consider the estimation of a regression function with random design and heteroscedastic noise in a nonparametric setting. More precisely, we address the problem of characterizing the optimal penalty when the regression function is…
In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice…
We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression…
We build penalized least-squares estimators using the slope heuristic and resampling penalties. We prove oracle inequalities for the selected estimator with leading constant asymptotically equal to 1. We compare the practical performances…
Penalized smoothing is a standard tool in regression analysis. Classical approaches often rely on basis or kernel expansions, which constrain the estimator to a fixed span and impose smoothness assumptions that may be restrictive for…
In recent years the importance of finding a meaningful pattern from huge datasets has become more challenging. Data miners try to adopt innovative methods to face this problem by applying feature selection methods. In this paper we propose…
Variable selection for models including interactions between explanatory variables often needs to obey certain hierarchical constraints. The weak or strong structural hierarchy requires that the existence of an interaction term implies at…
Model selection is a crucial issue in machine-learning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is…
This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression, spline…
We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…
In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Data-dependent penalties are constructed, which are based on estimates of the complexity of a small subclass of…
While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly…
We propose to address the common problem of linear estimation in linear statistical models by using a model selection approach via penalization. Depending then on the framework in which the linear statistical model is considered namely the…
This article investigates unsupervised classification techniques for categorical multivariate data. The study employs multivariate multinomial mixture modeling, which is a type of model particularly applicable to multilocus genotypic data.…
For many practical problems, the regression models follow the strong heredity property (also known as the marginality), which means they include parent main effects when a second-order effect is present. Existing methods rely mostly on…
This paper studies V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first…
Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least…