English

SEDGE: Structural Extrapolated Data Generation

Machine Learning 2026-05-15 v2

Abstract

This paper aims to address the challenge of data generation beyond the training data and proposes a framework for Structural Extrapolated Data GEneration (SEDGE) based on suitable assumptions on the underlying data-generating process. We provide conditions under which data satisfying novel specifications can be generated reliably, together with the approximate identifiability of the distribution of such data under certain ``conservative" assumptions, as well as the inherent non-identifiability of this distribution without such assumptions. On the algorithmic side, we develop practical methods to achieve extrapolated data generation, based on a structure-informed optimization strategy or diffusion posterior sampling, respectively. We verify the extrapolation performance on synthetic data and also consider extrapolated image generation as a real-world scenario to illustrate the validity of the proposed framework.

Keywords

Cite

@article{arxiv.2604.02482,
  title  = {SEDGE: Structural Extrapolated Data Generation},
  author = {Kun Zhang and Jiaqi Sun and Yiqing Li and Ignavier Ng and Namrata Deka and Shaoan Xie},
  journal= {arXiv preprint arXiv:2604.02482},
  year   = {2026}
}