English

Differential Test Functioning via Robust Scaling

Methodology 2026-02-10 v6

Abstract

In the item response theory (IRT) literature, differential test functioning (DTF) has been conceptualized in terms of how the test response function differs over groups of respondents. This paper presents an alternative approach to DTF that focusses on how the distribution of the latent trait differs over groups, which is referred to as impact. It is proposed to evaluate DTF by comparing two estimates of impact, one that naively aggregates over all test items and a robust alternative that down-weights items that exhibit differential item functioning (DIF). Taking this approach, this paper makes the following three contributions. First it is shown that the difference between the naive and robust estimands provides a convenient effect size for quantifying the extent to which DIF affects conclusions about impact (as opposed to test scores). Second it is shown how to construct a robust estimator that yields consistent estimates of impact whenever fewer than 1/2 of items exhibit DIF. Third, a relatively general purpose Wald test of the difference between two estimates of impact is developed. Using simulations and an empirical example from physics education, it is shown how the proposed effect size and test statistic perform using the proposed robust estimator of impact, as well as estimators that arise from conventional item-by-item tests of DIF.

Cite

@article{arxiv.2409.03502,
  title  = {Differential Test Functioning via Robust Scaling},
  author = {Peter F. Halpin},
  journal= {arXiv preprint arXiv:2409.03502},
  year   = {2026}
}

Comments

23 pages, 3 figure