A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

S. A. Murphy; Y. Deng; E. B. Laber; H. R. Maei; R. S. Sutton; K. Witkiewitz

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

Machine Learning 2016-07-19 v1 Machine Learning

Authors: S. A. Murphy , Y. Deng , E. B. Laber , H. R. Maei , R. S. Sutton , K. Witkiewitz

Abstract

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

Abstract

Keywords

Cite

Related papers