Risk-Averse Learning by Temporal Difference Methods
Optimization and Control
2020-03-03 v1 Machine Learning
Abstract
We consider reinforcement learning with performance evaluated by a dynamic risk measure. We construct a projected risk-averse dynamic programming equation and study its properties. Then we propose risk-averse counterparts of the methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.
Cite
@article{arxiv.2003.00780,
title = {Risk-Averse Learning by Temporal Difference Methods},
author = {Umit Kose and Andrzej Ruszczynski},
journal= {arXiv preprint arXiv:2003.00780},
year = {2020}
}