We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function.
Cite
@article{arxiv.1310.3697,
title = {Variance Adjusted Actor Critic Algorithms},
author = {Aviv Tamar and Shie Mannor},
journal= {arXiv preprint arXiv:1310.3697},
year = {2013}
}