English

A Black-Box Debiasing Framework for Conditional Sampling

Methodology 2025-10-14 v1 Machine Learning

Abstract

Conditional sampling is a fundamental task in Bayesian statistics and generative modeling. Consider the problem of sampling from the posterior distribution PXY=yP_{X|Y=y^*} for some observation yy^*, where the likelihood PYXP_{Y|X} is known, and we are given nn i.i.d. samples D={Xi}i=1nD=\{X_i\}_{i=1}^n drawn from an unknown prior distribution πX\pi_X. Suppose that f(π^Xn)f(\hat{\pi}_{X^n}) is the distribution of a posterior sample generated by an algorithm (e.g. a conditional generative model or the Bayes rule) when π^Xn\hat{\pi}_{X^n} is the empirical distribution of the training data. Although averaging over the randomness of the training data DD, we have ED(π^Xn)=πX\mathbb{E}_D\left(\hat{\pi}_{X^n}\right)= \pi_X, we do not have ED{f(π^Xn)}=f(πX)\mathbb{E}_D\left\{f(\hat{\pi}_{X^n})\right\}= f(\pi_X) due to the nonlinearity of ff, leading to a bias. In this paper we propose a black-box debiasing scheme that improves the accuracy of such a naive plug-in approach. For any integer kk and under boundedness of the likelihood and smoothness of ff, we generate samples X^(1),,X^(k)\hat{X}^{(1)},\dots,\hat{X}^{(k)} and weights w1,,wkw_1,\dots,w_k such that i=1kwiPX^(i)\sum_{i=1}^kw_iP_{\hat{X}^{(i)}} is a kk-th order approximation of f(πX)f(\pi_X), where the generation process treats ff as a black-box. Our generation process achieves higher accuracy when averaged over the randomness of the training data, without degrading the variance, which can be interpreted as improving memorization without compromising generalization in generative models.

Keywords

Cite

@article{arxiv.2510.11071,
  title  = {A Black-Box Debiasing Framework for Conditional Sampling},
  author = {Han Cui and Jingbo Liu},
  journal= {arXiv preprint arXiv:2510.11071},
  year   = {2025}
}
R2 v1 2026-07-01T06:33:13.426Z