(Ab)Using Regression for Data Adjustment

Lutz Duembgen

(Ab)Using Regression for Data Adjustment

Statistics Theory 2016-09-30 v6 Statistics Theory

Authors: Lutz Duembgen

Abstract

In various economic applications, people want to compare $n$ units with respect to certain quantities $Y_1, Y_2, \ldots, Y_n$ measuring their performance. The latter, however, is often influenced by certain factors which are beyond control of the units, and one would like to extract an adjusted performance from the data. Specifically, let $X_i \in \mathcal{X}$ summarize the factors of the $i$ -th unit. Then one could think of a model equation $Y_i = f_o(X_i) + \epsilon_i$ with a regression function $f_o : \mathcal{X} \to \mathbb{R}$ describing the unavoidable influence of the factors $X_i$ and $\epsilon_i$ being the adjusted performance of the $i$ -th unit. Now a common proposal is to estimate $f_o$ via regression methods by a function $\hat{f}$ depending on the current data $(X_i,Y_i)$ , possibly augmented by additional past data, and to use the residuals $\hat{\epsilon}_i := Y_i - \hat{f}(X_i)$ as surrogates for the adjusted performances $\epsilon_i$ . In the present report we discuss this approach, its potential pitfalls and (mis)interpretation. In particular, an unavoidable property of the residuals $\hat{\epsilon}_i$ is that they measure only parts of the adjusted performance while the remaining parts get hidden in the estimated function $\hat{f}$ . Possible alternatives are mentioned briefly.

Keywords

nonparametric regression statistical inference

Cite

@article{arxiv.1202.1964,
  title  = {(Ab)Using Regression for Data Adjustment},
  author = {Lutz Duembgen},
  journal= {arXiv preprint arXiv:1202.1964},
  year   = {2016}
}

Comments

Replaces an older manuscript "On Ranks of Regression Errors and Residuals"

(Ab)Using Regression for Data Adjustment

Abstract

Keywords

Cite

Comments

Related papers