Related papers: Stability Selection
In variable or graph selection problems, finding a right-sized model or controlling the number of false positives is notoriously difficult. Recently, a meta-algorithm called Stability Selection was proposed that can provide reliable…
Stability selection (Meinshausen and Buhlmann, 2010) makes any feature selection method more stable by returning only those features that are consistently selected across many subsamples. We prove (in what is, to our knowledge, the first…
Modern biotechnologies often result in high-dimensional data sets with much more variables than observations (n $\ll$ p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important…
Stability selection is a widely adopted resampling-based framework for high-dimensional variable selection. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability…
In modern data analysis, sparse model selection becomes inevitable once the number of predictors variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The…
We study feature selection in high-dimensional regression under two distinct sources of instability: sampling variability and measurement error in the design matrix. Stability Selection addresses the former through sub-sampling and…
Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model…
The Lasso is a prominent algorithm for variable selection. However, its instability in the presence of correlated variables in the high-dimensional setting is well-documented. Although previous research has attempted to address this issue…
We propose a generalized debiased Lasso estimator based on a stability principle. When a single column of the design matrix is perturbed, the estimator admits a simple update formula that can be computed from the original solution. Under…
Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical…
The instability in the selection of models is a major concern with data sets containing a large number of covariates. We focus on stability selection which is used as a technique to improve variable selection performance for a range of…
Stability selection is a versatile framework for structure estimation and variable selection in high-dimensional setting, primarily grounded in frequentist principles. In this paper, we propose an enhanced methodology that integrates…
Recently, many regularized procedures have been proposed for variable selection in linear regression, but their performance depends on the tuning parameter selection. Here a criterion for the tuning parameter selection is proposed, which…
Feature selection, as a vital dimension reduction technique, reduces data dimension by identifying an essential subset of input features, which can facilitate interpretable insights into learning and inference processes. Algorithmic…
Selecting regularization parameters in penalized high-dimensional graphical models in a principled, data-driven, and computationally efficient manner continues to be one of the key challenges in high-dimensional statistics. We present…
The Lasso has been widely used as a method for variable selection, valued for its simplicity and empirical performance. However, Lasso's selection stability deteriorates in the presence of correlated predictors. Several approaches have been…
A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC),…
Stability Selection was recently introduced by Meinshausen and Buhlmann (2010) as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection…
Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation…
Variable selection for high-dimensional, highly correlated data has long been a challenging problem, often yielding unstable and unreliable models. We propose a resample-aggregate framework that exploits diffusion models' ability to…