Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents

Authors: Wenhao Li, Xiangfeng Wang, Bo Jin

Machine Learning2026-05v1license

Abstract

Diffusion-based planning has achieved strong results in single-agent offline reinforcement learning, yet scaling to many-agent systems remains intractable due to the curse of dimensionality in the joint trajectory space. We introduce MF-Diffuser, a framework that lifts trajectory planning to the Wasserstein space of trajectory distributions, where the propagation of chaos ensures a small representative subset of agents captures the full population dynamics. Our approach features a value-weighted chaotic entropy objective that reconciles generative fidelity with return maximization, and a hierarchical coarse-to-fine strategy that progressively grows the agent population during denoising. We establish end-to-end suboptimality bounds with four interpretable terms, revealing that mean-field approximation error scales as $O(H^2/\sqrt{N})$ while offline distribution shift provably does not grow with population size $N$ , and prove the generated policy is an approximate mean-field Nash equilibrium with explicit convergence guarantees. Experiments on three mean-field RL benchmarks -- spanning stage games, sequential dynamics, and adversarial team competition -- show MF-Diffuser achieves the best return in the majority of settings, with the largest gains on suboptimal offline data and at extreme scales ( $N \geq 10^3$ ).

Comments: 71 pages, 15 figures, 16 tables

Cite

@article{arxiv.2605.30190,
  title  = {Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents},
  author = {Wenhao Li and Xiangfeng Wang and Bo Jin},
  journal= {arXiv preprint arXiv:2605.30190},
  year   = {2026}
}

← Machine Learning · Home