English

Computing Robust Leverage Diagnostics when the Design Matrix Contains Coded Categorical Variables

Computation 2013-01-23 v1

Abstract

For a robust leverage diagnostic in linear regression, Rousseeuw and van Zomeren [1990] proposed using robust distance (Mahalanobis distance computed using robust estimates of location and covariance). However, a design matrix X that contains coded categorical predictor variables is often sufficiently sparse that robust estimates of location and covariance cannot be computed. Specifically, matrices formed by taking subsets of the rows of X are likely to be singular, causing algorithms that rely on subsampling to fail. Following the spirit of Maronna and Yohai [2000], we observe that extreme leverage points are extreme in the continuous predictor variables. We therefore propose a robust leverage diagnostic that combines a robust analysis of the continuous predictor variables and the classical definition of leverage.

Keywords

Cite

@article{arxiv.1301.5035,
  title  = {Computing Robust Leverage Diagnostics when the Design Matrix Contains Coded Categorical Variables},
  author = {Kjell Konis},
  journal= {arXiv preprint arXiv:1301.5035},
  year   = {2013}
}
R2 v1 2026-06-21T23:13:11.471Z