Robustness to missing data: breakdown point analysis
Abstract
Missing data is pervasive in econometric applications, and rarely is it plausible that the data are missing (completely) at random. This paper proposes a methodology for studying the robustness of results drawn from incomplete datasets. Selection is measured as the divergence from the distribution of complete observations to the distribution of incomplete observations. The breakdown point is defined as the minimal amount of selection needed to overturn a given result. Reporting point estimates and lower confidence intervals of the breakdown point is a simple, concise way to communicate the robustness of a result. An estimator of the breakdown point is proposed and shown root-n consistent and asymptotically normal. This estimator can be applied directly to conclusions drawn from any model identified with the generalized method of moments (GMM) that satisfies mild assumptions. Simulations demonstrate the finite sample performance of the breakdown point estimator on averages, linear regression, and logistic regression. The methodology is illustrated by estimating the breakdown point of conclusions drawn from several randomized controlled trails suffering from missing data due to attrition.
Cite
@article{arxiv.2406.06804,
title = {Robustness to missing data: breakdown point analysis},
author = {Daniel Ober-Reynolds},
journal= {arXiv preprint arXiv:2406.06804},
year = {2025}
}
Comments
66 pages, 3 figures. Presented at the 2023 North American Summer Meeting of the Econometric Society. Accepted manuscript