Manifold Regularization for Kernelized LSTD
Abstract
Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.
Cite
@article{arxiv.1710.05387,
title = {Manifold Regularization for Kernelized LSTD},
author = {Xinyan Yan and Krzysztof Choromanski and Byron Boots and Vikas Sindhwani},
journal= {arXiv preprint arXiv:1710.05387},
year = {2017}
}
Comments
6 pages, CoRL 2017 non-archival track