English
Related papers

Related papers: Robust Variable Selection under Cellwise Contamina…

200 papers

There is a great need for robust techniques in data mining and machine learning contexts where many standard techniques such as principal component analysis and linear discriminant analysis are inherently susceptible to outliers.…

Methodology · Statistics 2015-09-28 Garth Tarr , Samuel Müller , Neville C. Weber

We propose a data-analytic method for detecting cellwise outliers. Given a robust covariance matrix, outlying cells (entries) in a row are found by the cellHandler technique which combines lasso regression with a stepwise application of…

Methodology · Statistics 2024-07-08 Jakob Raymaekers , Peter J. Rousseeuw

Large datasets are often affected by cell-wise outliers in the form of missing or erroneous data. However, discarding any samples containing outliers may result in a dataset that is too small to accurately estimate the covariance matrix.…

Statistics Theory · Mathematics 2023-11-13 Karim Lounici , Grégoire Pacreau

Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The…

Statistics Theory · Mathematics 2016-12-28 Andy Leung , Hongyang Zhang , Ruben H. Zamar

The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating…

Methodology · Statistics 2025-05-27 Fabio Centofanti , Mia Hubert , Peter J. Rousseeuw

Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of…

Statistics Theory · Mathematics 2014-06-24 Claudio Agostinelli , Andy Leung , Victor J. Yohai , Ruben H. Zamar

Cellwise contamination remains a challenging problem for data scientists, particularly in research fields that require the selection of sparse features. Traditional robust methods may not be feasible nor efficient in dealing with such…

Methodology · Statistics 2024-03-04 Peng Su , Garth Tarr , Samuel Muller , Suojin Wang

Many problems in signal processing require finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. When dealing with real-world data, the presence of outliers and impulsive noise must also be accounted…

Statistics Theory · Mathematics 2017-05-08 Jasin Machkour , Michael Muma , Bastian Alt , Abdelhak M. Zoubir

Multivariate linear regression is a fundamental statistical task, but classical estimators such as ordinary least squares are highly sensitive to outliers. These may occur as casewise outliers that affect entire observations, or as outlying…

Methodology · Statistics 2026-05-11 Fabio Centofanti , Mia Hubert , Peter J. Rousseeuw

It is well-known that real data often contain outliers. The term outlier typically refers to a case, that is, a row of the $n \times d$ data matrix. In recent times a different type has come into focus, the cellwise outliers. These are…

Methodology · Statistics 2024-07-08 Jakob Raymaekers , Peter J. Rousseeuw

Classical discriminant analysis (DA) is based on the mean and empirical covariance matrix of each class, both of which are sensitive to outliers in the data. In the past the focus was on casewise outliers, that is, datapoints that lie far…

Methodology · Statistics 2026-05-29 Fabio Centofanti , Can Hakan Dagidir , Mia Hubert , Peter J. Rousseeuw

Lasso is a popular and efficient approach to simultaneous estimation and variable selection in high-dimensional regression models. In this paper, a robust LAD-lasso method for multiple outcomes is presented that addresses the challenges of…

Methodology · Statistics 2022-12-02 Jyrki Möttönen , Tero Lähderanta , Janne Salonen , Mikko J. Sillanpää

The dependency structure of multivariate data can be analyzed using the covariance matrix $\Sigma$. In many fields the precision matrix $\Sigma^{-1}$ is even more informative. As the sample covariance estimator is singular in…

Methodology · Statistics 2015-06-04 Viktoria Öllerer , Christophe Croux

Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set…

Methodology · Statistics 2024-02-01 Graciela Boente , Alejandra Martínez

Outlying observations can be challenging to handle and adversely affect subsequent analyses, especially in data with increasing dimensional complexity. Although outliers are not always undesired anomalies in the data and may possess…

Methodology · Statistics 2025-09-18 Anthony-Alexander Christidis , Gabriela Cohen-Freue

The usual Minimum Covariance Determinant (MCD) estimator of a covariance matrix is robust against casewise outliers. These are cases (that is, rows of the data matrix) that behave differently from the majority of cases, raising suspicion…

Methodology · Statistics 2024-07-08 Jakob Raymaekers , Peter J. Rousseeuw

We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of…

Applications · Statistics 2011-04-19 Sijian Wang , Bin Nan , Saharon Rosset , Ji Zhu

In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix…

Statistics Theory · Mathematics 2017-07-14 Marie Perrot-Dockès , Céline Lévy-Leduc , Laure Sansonnet , Julien Chiquet

Penalized logistic regression is extremely useful for binary classification with large number of covariates (higher than the sample size), having several real life applications, including genomic disease classification. However, the…

Methodology · Statistics 2023-04-10 Ayanendranath Basu , Abhik Ghosh , María Jaenada , Leandro Pardo

We propose a residual randomization procedure designed for robust Lasso-based inference in the high-dimensional setting. Compared to earlier work that focuses on sub-Gaussian errors, the proposed procedure is designed to work robustly in…

Methodology · Statistics 2021-08-20 Y. Samuel Wang , Si Kai Lee , Panos Toulis , Mladen Kolar
‹ Prev 1 2 3 10 Next ›