English

Distributed Thompson Sampling

Artificial Intelligence 2021-09-10 v2 Machine Learning

Abstract

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

Keywords

Cite

@article{arxiv.2012.01789,
  title  = {Distributed Thompson Sampling},
  author = {Jing Dong and Tan Li and Shaolei Ren and Linqi Song},
  journal= {arXiv preprint arXiv:2012.01789},
  year   = {2021}
}

Comments

The paper is not finished and will not be updated