Related papers: Robust subset selection
Outlying observations can be challenging to handle and adversely affect subsequent analyses, especially in data with increasing dimensional complexity. Although outliers are not always undesired anomalies in the data and may possess…
Fully robust versions of the elastic net estimator are introduced for linear and logistic regression. The algorithms to compute the estimators are based on the idea of repeatedly applying the non-robust classical estimators to data subsets…
The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints. An estimator may be robust in a number of different ways: to contamination of the dataset, to…
A robust estimator is proposed for the parameters that characterize the linear regression problem. It is based on the notion of shrinkages, often used in Finance and previously studied for outlier detection in multivariate data. A thorough…
The problem of best subset selection in linear regression is considered with the aim to find a fixed size subset of features that best fits the response. This is particularly challenging when the total available number of features is very…
In this paper we discuss the variable selection method from \ell0-norm constrained regression, which is equivalent to the problem of finding the best subset of a fixed size. Our study focuses on two aspects, consistency and computation. We…
The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is…
In high-dimensional statistics, variable selection recovers the latent sparse patterns from all possible covariate combinations. This paper proposes a novel optimization method to solve the exact L0-regularized regression problem, which is…
Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The…
A robust estimator for a wide family of mixtures of linear regression is presented. Robustness is based on the joint adoption of the Cluster Weighted Model and of an estimator based on trimming and restrictions. The selected model provides…
Contamination can severely distort an estimator unless the estimation procedure is suitably robust. This is a well-known issue and has been addressed in Robust Statistics, however, the relation of contamination and distorted variable…
Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the…
Algorithmic robust statistics has traditionally focused on the contamination model where a small fraction of the samples are arbitrarily corrupted. We consider a recent contamination model that combines two kinds of corruptions: (i) small…
We study the optimal sample complexity of variable selection in linear regression under general design covariance, and show that subset selection is optimal while under standard complexity assumptions, efficient algorithms for this problem…
We study stochastic programs where the decision-maker cannot observe the distribution of the exogenous uncertainties but has access to a finite set of independent samples from this distribution. In this setting, the goal is to find a…
We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is…
In today's era of big data, robust least-squares regression becomes a more challenging problem when considering the adversarial corruption along with explosive growth of datasets. Traditional robust methods can handle the noise but suffer…
Many modern datasets are collected automatically and are thus easily contaminated by outliers. This led to a regain of interest in robust estimation, including new notions of robustness such as robustness to adversarial contamination of the…
The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed.…
Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association…