English

Optimizing the CVaR via Sampling

Machine Learning 2014-11-25 v4 Artificial Intelligence Machine Learning

Abstract

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.

Keywords

Cite

@article{arxiv.1404.3862,
  title  = {Optimizing the CVaR via Sampling},
  author = {Aviv Tamar and Yonatan Glassner and Shie Mannor},
  journal= {arXiv preprint arXiv:1404.3862},
  year   = {2014}
}

Comments

To appear in AAAI 2015

R2 v1 2026-06-22T03:51:06.304Z