Related papers: Robust Multi-Model Subset Selection
Machine learning and data analysis have been used in many robotics fields, especially for modelling. Data are usually the result of sensor measurements and, as such, they might be subjected to noise and outliers. The presence of outliers…
The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its…
This paper presents a fast methodology, called ROBOUT, to identify outliers in a response variable conditional on a set of linearly related predictors, retrieved from a large granular dataset. ROBOUT is shown to be effective and…
Outliers widely occur in big-data applications and may severely affect statistical estimation and inference. In this paper, a framework of outlier-resistant estimation is introduced to robustify an arbitrarily given loss function. It has a…
The Residual Congruent Subset (RCS) is a new method for finding outliers in the linear regression setting. Like many other outlier detection procedures, RCS searches for a subset which minimizes a criterion. The difference is that the new…
Real data often contain anomalous cases, also known as outliers. These may spoil the resulting analysis but they may also contain valuable information. In either case, the ability to detect such anomalies is essential. A useful tool for…
Linear mixed models (LMMs) are a popular class of methods for analyzing longitudinal and clustered data. However, such models can be sensitive to outliers, and this can lead to biased inference on model parameters and inaccurate prediction…
When applying a statistical method in practice it often occurs that some observations deviate from the usual assumptions. However, many classical methods are sensitive to outliers. The goal of robust statistics is to develop methods that…
Multi-dimensional scaling (MDS) plays a central role in data-exploration, dimensionality reduction and visualization. State-of-the-art MDS algorithms are not robust to outliers, yielding significant errors in the embedding even when only a…
The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed.…
Hyperparameters tuning and model selection are important steps in machine learning. Unfortunately, classical hyperparameter calibration and model selection procedures are sensitive to outliers and heavy-tailed data. In this work, we…
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another…
In many machine learning tasks, a common approach for dealing with large-scale data is to build a small summary, {\em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers…
The panel data regression models have gained increasing attention in different areas of research including but not limited to econometrics, environmental sciences, epidemiology, behavioral and social sciences. However, the presence of…
Robust regression models in the presence of outliers have significant practical relevance in areas such as signal processing, financial econometrics, and energy management. Many existing robust regression methods, either grounded in…
Robust density estimation refers to the consistent estimation of the density function even when the data is contaminated by outliers. We find that existing forest density estimation at a certain point is inherently resistant to the outliers…
Outlier detection is a fundamental task in data mining and has many applications including detecting errors in databases. While there has been extensive prior work on methods for outlier detection, modern datasets often have sizes that are…
There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression…
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a…
In high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates parameter estimation and model interpretation. However, commonly-used…