Optimizing the CVaR via Sampling
Machine Learning
2014-11-25 v4 Artificial Intelligence
Machine Learning
Abstract
Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.
Cite
@article{arxiv.1404.3862,
title = {Optimizing the CVaR via Sampling},
author = {Aviv Tamar and Yonatan Glassner and Shie Mannor},
journal= {arXiv preprint arXiv:1404.3862},
year = {2014}
}
Comments
To appear in AAAI 2015