HomeMachine LearningarXiv:2605.29828

When Do Graph Foundation Models Transfer? A Data-Centric Theory

Machine Learning2026-05v1license

Abstract

Graph foundation models (GFMs) aim to reuse a single backbone across diverse graph domains, yet their transfer is often uneven and can exhibit negative transfer. While most prior work improves transfer through architectural or adaptation choices, we ask a data-centric question: which properties of two graph domains determine how much a fixed representation model changes its outputs? Using a graphon-based continuous limit for dense graphs, we show that for both set-based and message-passing tokenizations, any Lipschitz backbone admits an explicit decomposition of cross-domain output shift into (i) graph-specific finite-sample approximation terms and (ii) an intrinsic, relabeling-invariant domain discrepancy capturing structural mismatch. A key ingredient is positional-encoding (PE) stability: we establish stability guarantees for spectral PEs and highlight contrasting behaviors of eigenvector- versus subspace-based PEs. Experiments on synthetic and real graphs validate the theory and translate the decomposition into guidance for data curation in GFM transfer.

Comments: 21 pages, including appendix. Accepted at ICML 2026

Cite

@article{arxiv.2605.29828,
  title  = {When Do Graph Foundation Models Transfer? A Data-Centric Theory},
  author = {Jiajun Zhu and Ying Chen and Peihao Wang and Yixuan He and Pan Li and Aditya Akella and Zhangyang Wang},
  journal= {arXiv preprint arXiv:2605.29828},
  year   = {2026}
}