Policy Optimization via Importance Sampling

Alberto Maria Metelli; Matteo Papini; Francesco Faccio; Marcello Restelli

Policy Optimization via Importance Sampling

Machine Learning 2018-11-01 v2 Artificial Intelligence Machine Learning

Authors: Alberto Maria Metelli , Matteo Papini , Francesco Faccio , Marcello Restelli

Abstract

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.

Keywords

optimization algorithm hyperparameter optimization policy gradient

Cite

@article{arxiv.1809.06098,
  title  = {Policy Optimization via Importance Sampling},
  author = {Alberto Maria Metelli and Matteo Papini and Francesco Faccio and Marcello Restelli},
  journal= {arXiv preprint arXiv:1809.06098},
  year   = {2018}
}

Policy Optimization via Importance Sampling

Abstract

Keywords

Cite

Related papers