Bootstrapping Upper Confidence Bound

Botao Hao; Yasin Abbasi-Yadkori; Zheng Wen; Guang Cheng

Bootstrapping Upper Confidence Bound

Machine Learning 2019-11-01 v3 Machine Learning

Authors: Botao Hao , Yasin Abbasi-Yadkori , Zheng Wen , Guang Cheng

Abstract

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration inequalities, which thus lead to over-exploration. In this paper, we propose a non-parametric and data-dependent UCB algorithm based on the multiplier bootstrap. To improve its finite sample performance, we further incorporate second-order correction into the above construction. In theory, we derive both problem-dependent and problem-independent regret bounds for multi-armed bandits under a much weaker tail assumption than the standard sub-Gaussianity. Numerical results demonstrate significant regret reductions by our method, in comparison with several baselines in a range of multi-armed and linear bandit problems.

Keywords

multi-armed bandits bootstrap inference online learning

Cite

@article{arxiv.1906.05247,
  title  = {Bootstrapping Upper Confidence Bound},
  author = {Botao Hao and Yasin Abbasi-Yadkori and Zheng Wen and Guang Cheng},
  journal= {arXiv preprint arXiv:1906.05247},
  year   = {2019}
}

Comments

Accepted by NeurIPS 2019

Bootstrapping Upper Confidence Bound

Abstract

Keywords

Cite

Comments

Related papers