Mitigating Structural Overfitting: A Distribution-Aware Rectification Framework for Missing Feature Imputation

Yifan Song; Fenglin Yu; Yihong Luo; Xingjian Tao; Siya Qiu; Kai Han; Jing Tang

Mitigating Structural Overfitting: A Distribution-Aware Rectification Framework for Missing Feature Imputation

Machine Learning 2026-04-07 v3 Social and Information Networks

Authors: Yifan Song , Fenglin Yu , Yihong Luo , Xingjian Tao , Siya Qiu , Kai Han , Jing Tang

Abstract

Incomplete node features are ubiquitous in real-world scenarios such as user profiling and cold-start recommendation, which severely hinders the practical deployment of graph learning systems (e.g., GNNs). Existing solutions typically rely on diffusion-based structural smoothing (e.g., feature propagation) to impute missing values. However, we find that these approaches suffer from structural overfitting, leading to three progressive challenges: 1) performance degradation on disjoint graphs, 2) loss of semantic diversity due to over-smoothing, and 3) feature distribution shift when generalizing to unseen graph structures (inductive tasks). To address these challenges, we introduce the \textbf{\DART} framework. It begins by employing {\em Global Structural Augmentation (GSA)}, which establishes global correlations to bridge disjoint components and extend diffusion coverage. Building upon this, we design a semantic rectifier based on masked autoencoding. This module learns the latent feature manifold to recover natural semantic details. Crucially, we introduce a test-time distribution rectification mechanism that projects structurally biased features back onto the learned manifold during inference, effectively bridging the inductive distribution gap. Furthermore, considering that synthetic masking fails to reflect real-world sparsity, we present a new dataset \textbf{Sailing} collected from voyage records with naturally missing attributes. Extensive experiments on six public datasets and Sailing demonstrate that \DART significantly outperforms state-of-the-art methods in both transductive and inductive settings. Our code and dataset are available at https://github.com/yfsong00/DART.

Keywords

fault tree analysis graph generation graph representation learning

Cite

@article{arxiv.2512.06356,
  title  = {Mitigating Structural Overfitting: A Distribution-Aware Rectification Framework for Missing Feature Imputation},
  author = {Yifan Song and Fenglin Yu and Yihong Luo and Xingjian Tao and Siya Qiu and Kai Han and Jing Tang},
  journal= {arXiv preprint arXiv:2512.06356},
  year   = {2026}
}

Comments

Accepted by SIGIR2026

Mitigating Structural Overfitting: A Distribution-Aware Rectification Framework for Missing Feature Imputation

Abstract

Keywords

Cite

Comments

Related papers