Related papers: Model selection by resampling penalization

Model selection by resampling penalization

In this paper, a new family of resampling-based penalization procedures for model selection is defined in a general framework. It generalizes several methods, including Efron's bootstrap penalization and the leave-one-out penalization…

Statistics Theory · Mathematics 2009-06-19 Sylvain Arlot

Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases

We investigate the optimality for model selection of the so-called slope heuristics, $V$-fold cross-validation and $V$-fold penalization in a heteroscedastic with random design regression context. We consider a new class of linear models…

Statistics Theory · Mathematics 2023-03-08 Fabien Navarro , Adrien Saumard

Choosing a penalty for model selection in heteroscedastic regression

We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is a function of the dimension of the model, at least…

Statistics Theory · Mathematics 2010-07-28 Sylvain Arlot

The Slope Heuristics in Heteroscedastic Regression

We consider the estimation of a regression function with random design and heteroscedastic noise in a nonparametric setting. More precisely, we address the problem of characterizing the optimal penalty when the regression function is…

Statistics Theory · Mathematics 2015-06-29 Adrien Saumard

An analysis of the cost of hyper-parameter selection via split-sample validation, with applications to penalized regression

In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice…

Machine Learning · Statistics 2019-04-01 Jean Feng , Noah Simon

V-fold cross-validation improved: V-fold penalization

We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression…

Statistics Theory · Mathematics 2008-02-07 Sylvain Arlot

Optimal model selection in density estimation

We build penalized least-squares estimators using the slope heuristic and resampling penalties. We prove oracle inequalities for the selected estimator with leading constant asymptotically equal to 1. We compare the practical performances…

Statistics Theory · Mathematics 2015-03-13 Matthieu Lerasle

Noise-resilient penalty operators based on statistical differentiation schemes

Penalized smoothing is a standard tool in regression analysis. Classical approaches often rely on basis or kernel expansions, which constrain the estimator to a fixed span and impose smoothness assumptions that may be restrictive for…

Statistics Theory · Mathematics 2026-01-19 Marc Vidal , Yves Rosseel

Improving Performance of a Group of Classification Algorithms Using Resampling and Feature Selection

In recent years the importance of finding a meaningful pattern from huge datasets has become more challenging. Data miners try to adopt innovative methods to face this problem by applying feature selection methods. In this paper we propose…

Machine Learning · Computer Science 2014-03-11 Mehdi Naseriparsa , Amir-masoud Bidgoli , Touraj Varaee

Group Regularized Estimation under Structural Hierarchy

Variable selection for models including interactions between explanatory variables often needs to obey certain hierarchical constraints. The weak or strong structural hierarchy requires that the existence of an interaction term implies at…

Statistics Theory · Mathematics 2016-11-10 Yiyuan She , Zhifeng Wang , He Jiang

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Model selection is a crucial issue in machine-learning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is…

Machine Learning · Statistics 2012-12-11 Charanpal Dhanjal , Nicolas Baskiotis , Stéphan Clémençon , Nicolas Usunier

Data-driven calibration of linear estimators with minimal penalties

This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression, spline…

Statistics Theory · Mathematics 2011-09-15 Sylvain Arlot , Francis Bach

High-dimensional classification by sparse logistic regression

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…

Statistics Theory · Mathematics 2018-11-20 Felix Abramovich , Vadim Grinshtein

Complexity regularization via localized random penalties

In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Data-dependent penalties are constructed, which are based on estimates of the complexity of a small subclass of…

Statistics Theory · Mathematics 2007-06-13 Gabor Lugosi , Marten Wegkamp

Network cross-validation by edge sampling

While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly…

Methodology · Statistics 2020-05-04 Tianxi Li , Elizaveta Levina , Ji Zhu

Non-asymptotic model selection for linear non least-squares estimation in regression models and inverse problems

We propose to address the common problem of linear estimation in linear statistical models by using a model selection approach via penalization. Depending then on the framework in which the linear statistical model is considered namely the…

Statistics Theory · Mathematics 2009-09-11 Ikhlef Bechar

Clustering and variable selection for categorical multivariate data

This article investigates unsupervised classification techniques for categorical multivariate data. The study employs multivariate multinomial mixture modeling, which is a type of model particularly applicable to multilocus genotypic data.…

Statistics Theory · Mathematics 2014-03-11 Dominique Bontemps , Wilson Toussile

An Easy-to-Implement Hierarchical Standardization for Variable Selection Under Strong Heredity Constraint

For many practical problems, the regression models follow the strong heredity property (also known as the marginality), which means they include parent main effects when a second-order effect is present. Existing methods rely mostly on…

Methodology · Statistics 2020-07-28 Kedong Chen , William Li , Sijian Wang

Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

This paper studies V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first…

Statistics Theory · Mathematics 2015-10-13 Sylvain Arlot , Matthieu Lerasle

A unified approach to model selection and sparse recovery using regularized least squares

Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least…

Statistics Theory · Mathematics 2009-09-03 Jinchi Lv , Yingying Fan