English

Embarrassingly Parallel GFlowNets

Machine Learning 2024-06-06 v1 Machine Learning

Abstract

GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standard GFlowNets leads to intensive client-server communication. To alleviate both these issues, we propose embarrassingly parallel GFlowNet (EP-GFlowNet). EP-GFlowNet is a provably correct divide-and-conquer method to sample from product distributions of the form R()R1()...RN()R(\cdot) \propto R_1(\cdot) ... R_N(\cdot) -- e.g., in parallel or federated Bayes, where each RnR_n is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each RnR_n and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed \emph{aggregating balance} condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the EP-GFlowNets's effectiveness on many tasks, including parallel Bayesian phylogenetics, multi-objective multiset, sequence generation, and federated Bayesian structure learning.

Keywords

Cite

@article{arxiv.2406.03288,
  title  = {Embarrassingly Parallel GFlowNets},
  author = {Tiago da Silva and Luiz Max Carvalho and Amauri Souza and Samuel Kaski and Diego Mesquita},
  journal= {arXiv preprint arXiv:2406.03288},
  year   = {2024}
}

Comments

Accepted to ICML 2024

R2 v1 2026-06-28T16:54:35.140Z