Related papers: Robust Variable Selection under Cellwise Contamina…
There is a great need for robust techniques in data mining and machine learning contexts where many standard techniques such as principal component analysis and linear discriminant analysis are inherently susceptible to outliers.…
We propose a data-analytic method for detecting cellwise outliers. Given a robust covariance matrix, outlying cells (entries) in a row are found by the cellHandler technique which combines lasso regression with a stepwise application of…
Large datasets are often affected by cell-wise outliers in the form of missing or erroneous data. However, discarding any samples containing outliers may result in a dataset that is too small to accurately estimate the covariance matrix.…
Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The…
The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating…
Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of…
Cellwise contamination remains a challenging problem for data scientists, particularly in research fields that require the selection of sparse features. Traditional robust methods may not be feasible nor efficient in dealing with such…
Many problems in signal processing require finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. When dealing with real-world data, the presence of outliers and impulsive noise must also be accounted…
Multivariate linear regression is a fundamental statistical task, but classical estimators such as ordinary least squares are highly sensitive to outliers. These may occur as casewise outliers that affect entire observations, or as outlying…
It is well-known that real data often contain outliers. The term outlier typically refers to a case, that is, a row of the $n \times d$ data matrix. In recent times a different type has come into focus, the cellwise outliers. These are…
Classical discriminant analysis (DA) is based on the mean and empirical covariance matrix of each class, both of which are sensitive to outliers in the data. In the past the focus was on casewise outliers, that is, datapoints that lie far…
Lasso is a popular and efficient approach to simultaneous estimation and variable selection in high-dimensional regression models. In this paper, a robust LAD-lasso method for multiple outcomes is presented that addresses the challenges of…
The dependency structure of multivariate data can be analyzed using the covariance matrix $\Sigma$. In many fields the precision matrix $\Sigma^{-1}$ is even more informative. As the sample covariance estimator is singular in…
Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set…
Outlying observations can be challenging to handle and adversely affect subsequent analyses, especially in data with increasing dimensional complexity. Although outliers are not always undesired anomalies in the data and may possess…
The usual Minimum Covariance Determinant (MCD) estimator of a covariance matrix is robust against casewise outliers. These are cases (that is, rows of the data matrix) that behave differently from the majority of cases, raising suspicion…
We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of…
In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix…
Penalized logistic regression is extremely useful for binary classification with large number of covariates (higher than the sample size), having several real life applications, including genomic disease classification. However, the…
We propose a residual randomization procedure designed for robust Lasso-based inference in the high-dimensional setting. Compared to earlier work that focuses on sub-Gaussian errors, the proposed procedure is designed to work robustly in…