Constrained Upper Confidence Reinforcement Learning

Liyuan Zheng; Lillian J. Ratliff

Constrained Upper Confidence Reinforcement Learning

Machine Learning 2020-01-28 v1 Machine Learning

Authors: Liyuan Zheng , Lillian J. Ratliff

Abstract

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ( $O(T^{\frac{3}{4}}\sqrt{\log(T/\delta)})$ ) with respect to the reward while satisfying the constraints even while learning with probability $1-\delta$ . Illustrative examples are provided.

Keywords

online learning reinforcement learning machine learning theory

Cite

@article{arxiv.2001.09377,
  title  = {Constrained Upper Confidence Reinforcement Learning},
  author = {Liyuan Zheng and Lillian J. Ratliff},
  journal= {arXiv preprint arXiv:2001.09377},
  year   = {2020}
}

Constrained Upper Confidence Reinforcement Learning

Abstract

Keywords

Cite

Related papers