Opponent Aware Reinforcement Learning

Victor Gallego; Roi Naveiro; David Rios Insua; David Gomez-Ullate Oteiza

Opponent Aware Reinforcement Learning

Machine Learning 2019-08-27 v2 Machine Learning

Authors: Victor Gallego , Roi Naveiro , David Rios Insua , David Gomez-Ullate Oteiza

Abstract

We introduce Threatened Markov Decision Processes (TMDPs) as an extension of the classical Markov Decision Process framework for Reinforcement Learning (RL). TMDPs allow suporting a decision maker against potential opponents in a RL context. We also propose a level-k thinking scheme resulting in a novel learning approach to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries in RL while the agent learns

Keywords

markov decision processes reinforcement learning adversarial examples

Cite

@article{arxiv.1908.08773,
  title  = {Opponent Aware Reinforcement Learning},
  author = {Victor Gallego and Roi Naveiro and David Rios Insua and David Gomez-Ullate Oteiza},
  journal= {arXiv preprint arXiv:1908.08773},
  year   = {2019}
}

Comments

Substantially extends the previous work: https://www.aaai.org/ojs/index.php/AAAI/article/view/5106. This article draws heavily from arXiv arXiv:1809.01560

Opponent Aware Reinforcement Learning

Abstract

Keywords

Cite

Comments

Related papers