Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents
Abstract
Diffusion-based planning has achieved strong results in single-agent offline reinforcement learning, yet scaling to many-agent systems remains intractable due to the curse of dimensionality in the joint trajectory space. We introduce MF-Diffuser, a framework that lifts trajectory planning to the Wasserstein space of trajectory distributions, where the propagation of chaos ensures a small representative subset of agents captures the full population dynamics. Our approach features a value-weighted chaotic entropy objective that reconciles generative fidelity with return maximization, and a hierarchical coarse-to-fine strategy that progressively grows the agent population during denoising. We establish end-to-end suboptimality bounds with four interpretable terms, revealing that mean-field approximation error scales as while offline distribution shift provably does not grow with population size , and prove the generated policy is an approximate mean-field Nash equilibrium with explicit convergence guarantees. Experiments on three mean-field RL benchmarks -- spanning stage games, sequential dynamics, and adversarial team competition -- show MF-Diffuser achieves the best return in the majority of settings, with the largest gains on suboptimal offline data and at extreme scales ().
Comments: 71 pages, 15 figures, 16 tables
Cite
@article{arxiv.2605.30190,
title = {Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents},
author = {Wenhao Li and Xiangfeng Wang and Bo Jin},
journal= {arXiv preprint arXiv:2605.30190},
year = {2026}
}