English

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

Machine Learning 2016-07-19 v1 Machine Learning

Abstract

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

Keywords

Cite

@article{arxiv.1607.05047,
  title  = {A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward},
  author = {S. A. Murphy and Y. Deng and E. B. Laber and H. R. Maei and R. S. Sutton and K. Witkiewitz},
  journal= {arXiv preprint arXiv:1607.05047},
  year   = {2016}
}
R2 v1 2026-06-22T14:57:08.057Z