English

Multiagent Soft Q-Learning

Artificial Intelligence 2018-04-27 v1

Abstract

Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.

Keywords

Cite

@article{arxiv.1804.09817,
  title  = {Multiagent Soft Q-Learning},
  author = {Ermo Wei and Drew Wicke and David Freelan and Sean Luke},
  journal= {arXiv preprint arXiv:1804.09817},
  year   = {2018}
}

Comments

Accepted in AAAI 18 Spring Symposium

R2 v1 2026-06-23T01:36:10.126Z