Meta-Thompson Sampling

Branislav Kveton; Mikhail Konobeev; Manzil Zaheer; Chih-wei Hsu; Martin Mladenov; Craig Boutilier; Csaba Szepesvari

Meta-Thompson Sampling

Machine Learning 2021-06-24 v2 Machine Learning

Authors: Branislav Kveton , Mikhail Konobeev , Manzil Zaheer , Chih-wei Hsu , Martin Mladenov , Craig Boutilier , Csaba Szepesvari

View on arXiv ↗ PDF ↗

Abstract

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.

Keywords

contextual bandits multi-armed bandit machine learning theory

Cite

@article{arxiv.2102.06129,
  title  = {Meta-Thompson Sampling},
  author = {Branislav Kveton and Mikhail Konobeev and Manzil Zaheer and Chih-wei Hsu and Martin Mladenov and Craig Boutilier and Csaba Szepesvari},
  journal= {arXiv preprint arXiv:2102.06129},
  year   = {2021}
}

Comments

Proceedings of the 38th International Conference on Machine Learning

Meta-Thompson Sampling

Abstract

Keywords

Cite

Comments

Related papers