English

Reinforcement Learning with Parameterized Actions

Artificial Intelligence 2015-11-30 v4 Machine Learning

Abstract

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions-discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.

Keywords

Cite

@article{arxiv.1509.01644,
  title  = {Reinforcement Learning with Parameterized Actions},
  author = {Warwick Masson and Pravesh Ranchod and George Konidaris},
  journal= {arXiv preprint arXiv:1509.01644},
  year   = {2015}
}

Comments

Accepted for AAAI 2016

R2 v1 2026-06-22T10:49:45.040Z