English

Graph-Based Prediction Models for Data Debiasing

Methodology 2025-04-22 v2 Machine Learning Signal Processing

Abstract

Bias in data collection, arising from both under-reporting and over-reporting, poses significant challenges in critical applications such as healthcare and public safety. In this work, we introduce Graph-based Over- and Under-reporting Debiasing (GROUD), a novel graph-based optimization framework that debiases reported data by jointly estimating the true incident counts and the associated reporting bias probabilities. By modeling the bias as a smooth signal over a graph constructed from geophysical or feature-based similarities, our convex formulation not only ensures a unique solution but also comes with theoretical recovery guarantees under certain assumptions. We validate GROUD on both challenging simulated experiments and real-world datasets -- including Atlanta emergency calls and COVID-19 vaccine adverse event reports -- demonstrating its robustness and superior performance in accurately recovering debiased counts. This approach paves the way for more reliable downstream decision-making in systems affected by reporting irregularities.

Keywords

Cite

@article{arxiv.2504.09348,
  title  = {Graph-Based Prediction Models for Data Debiasing},
  author = {Dongze Wu and Hanyang Jiang and Yao Xie},
  journal= {arXiv preprint arXiv:2504.09348},
  year   = {2025}
}

Comments

We submitted this arXiv version by mistake. We have decided to update the original submission (arXiv:2307.07898) instead of submitting a separate article

R2 v1 2026-06-28T22:56:10.312Z