Multi-Preference Actor Critic

Ishan Durugkar; Matthew Hausknecht; Adith Swaminathan; Patrick MacAlpine

Multi-Preference Actor Critic

Machine Learning 2019-04-09 v1 Artificial Intelligence Machine Learning

Authors: Ishan Durugkar , Matthew Hausknecht , Adith Swaminathan , Patrick MacAlpine

Abstract

Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates. However, for most Reinforcement Learning tasks, humans can provide additional insight to constrain the policy learning. We introduce a general method to incorporate multiple different feedback channels into a single policy gradient loss. In our formulation, the Multi-Preference Actor Critic (M-PAC), these different types of feedback are implemented as constraints on the policy. We use a Lagrangian relaxation to satisfy these constraints using gradient descent while learning a policy that maximizes rewards. Experiments in Atari and Pendulum verify that constraints are being respected and can accelerate the learning process.

Keywords

policy gradient reinforcement learning multi-agent reinforcement learning

Cite

@article{arxiv.1904.03295,
  title  = {Multi-Preference Actor Critic},
  author = {Ishan Durugkar and Matthew Hausknecht and Adith Swaminathan and Patrick MacAlpine},
  journal= {arXiv preprint arXiv:1904.03295},
  year   = {2019}
}

Comments

NeurIPS Workshop on Deep RL, 2018

Multi-Preference Actor Critic

Abstract

Keywords

Cite

Comments

Related papers