English

(Ab)Using Regression for Data Adjustment

Statistics Theory 2016-09-30 v6 Statistics Theory

Abstract

In various economic applications, people want to compare nn units with respect to certain quantities Y1,Y2,,YnY_1, Y_2, \ldots, Y_n measuring their performance. The latter, however, is often influenced by certain factors which are beyond control of the units, and one would like to extract an adjusted performance from the data. Specifically, let XiXX_i \in \mathcal{X} summarize the factors of the ii-th unit. Then one could think of a model equation Yi=fo(Xi)+ϵiY_i = f_o(X_i) + \epsilon_i with a regression function fo:XRf_o : \mathcal{X} \to \mathbb{R} describing the unavoidable influence of the factors XiX_i and ϵi\epsilon_i being the adjusted performance of the ii-th unit. Now a common proposal is to estimate fof_o via regression methods by a function f^\hat{f} depending on the current data (Xi,Yi)(X_i,Y_i), possibly augmented by additional past data, and to use the residuals ϵ^i:=Yif^(Xi)\hat{\epsilon}_i := Y_i - \hat{f}(X_i) as surrogates for the adjusted performances ϵi\epsilon_i. In the present report we discuss this approach, its potential pitfalls and (mis)interpretation. In particular, an unavoidable property of the residuals ϵ^i\hat{\epsilon}_i is that they measure only parts of the adjusted performance while the remaining parts get hidden in the estimated function f^\hat{f}. Possible alternatives are mentioned briefly.

Keywords

Cite

@article{arxiv.1202.1964,
  title  = {(Ab)Using Regression for Data Adjustment},
  author = {Lutz Duembgen},
  journal= {arXiv preprint arXiv:1202.1964},
  year   = {2016}
}

Comments

Replaces an older manuscript "On Ranks of Regression Errors and Residuals"

R2 v1 2026-06-21T20:17:04.318Z