English

Thompson Sampling Algorithm for Stochastic Games

Optimization and Control 2026-01-30 v1

Abstract

We study a stochastic differential game with NN competitive players in a linear-quadratic framework with ergodic cost, where dd-dimensional diffusion processes govern the state dynamics with an unknown common drift (matrix). Assuming a Gaussian prior on the drift, we use filtering techniques to update its posterior estimates. Based on these estimates, we propose a Thompson-sampling-based algorithm with dynamic episode lengths to approximate strategies. We show that the Bayesian regret for each player has an error bound of order O(Tlog(T))O(\sqrt{T\log(T)}), where TT is the time-horizon, independent of the number of players. This implies that average regret per unit time goes to zero. Finally, we prove that the algorithm results in a Nash equilibrium.

Keywords

Cite

@article{arxiv.2601.20973,
  title  = {Thompson Sampling Algorithm for Stochastic Games},
  author = {Asaf Cohen and Ruolan He and Yuqiong Wang},
  journal= {arXiv preprint arXiv:2601.20973},
  year   = {2026}
}