Related papers: High-Breakdown Robust Multivariate Methods
Real data often contain anomalous cases, also known as outliers. These may spoil the resulting analysis but they may also contain valuable information. In either case, the ability to detect such anomalies is essential. A useful tool for…
Learning in the presence of outliers is a fundamental problem in statistics. Until recently, all known efficient unsupervised learning algorithms were very sensitive to outliers in high dimensions. In particular, even for the task of robust…
Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The…
This paper presents a fast methodology, called ROBOUT, to identify outliers in a response variable conditional on a set of linearly related predictors, retrieved from a large granular dataset. ROBOUT is shown to be effective and…
Nonparametric regression models offer a way to understand and quantify relationships between variables without having to identify an appropriate family of possible regression functions. Although many estimation methods for these models have…
This note investigates the problem of detecting outliers in longitudinal data. It compares well-known methods used in official statistics with proposals from the fields of data mining and machine learning that are based on the distance…
Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier. However, these scores are often not comparable across algorithms and can be…
Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of…
Classical discriminant analysis (DA) is based on the mean and empirical covariance matrix of each class, both of which are sensitive to outliers in the data. In the past the focus was on casewise outliers, that is, datapoints that lie far…
Machine learning and data analysis have been used in many robotics fields, especially for modelling. Data are usually the result of sensor measurements and, as such, they might be subjected to noise and outliers. The presence of outliers…
A robust estimation framework for binary regression models is studied, aiming to extend traditional approaches like logistic regression models. While previous studies largely focused on logistic models, we explore a broader class of models…
Outlying observations can be challenging to handle and adversely affect subsequent analyses, especially in data with increasing dimensional complexity. Although outliers are not always undesired anomalies in the data and may possess…
Most of the regularization methods such as the LASSO have one (or more) regularization parameter(s), and to select the value of the regularization parameter is essentially equal to select a model. Thus, to obtain a model suitable for the…
The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating…
Principal component regression uses principal components as regressors. It is particularly useful in prediction settings with high-dimensional covariates. The existing literature treating of Bayesian approaches is relatively sparse. We…
The inflated beta regression model is widely used for modeling continuous proportions with values at the boundaries. Maximum likelihood estimation for these models is well-known for its sensitivity to outliers, which can severely distort…
A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced…
The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its…
In data analysis, contamination caused by outliers is inevitable, and robust statistical methods are strongly demanded. In this paper, our concern is to develop a new approach for robust data analysis based on scoring rules. The scoring…
Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted in many other ways, such as systematic measurement errors and missing covariates. We generalize the…