Learning under Invariable Bayesian Safety

Gal Bahar; Omer Ben-Porat; Kevin Leyton-Brown; Moshe Tennenholtz

Learning under Invariable Bayesian Safety

Computer Science and Game Theory 2020-06-09 v1 Artificial Intelligence Computers and Society Machine Learning

Authors: Gal Bahar , Omer Ben-Porat , Kevin Leyton-Brown , Moshe Tennenholtz

Abstract

A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this line of literature by introducing a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold. Due to our modeling, the safe explore-and-exploit policy deserves careful planning, or otherwise, it will lead to sub-optimal welfare. We devise an asymptotically optimal algorithm for the setting and analyze its instance-dependent convergence rate.

Keywords

contextual bandits multi-armed bandit bayesian persuasion

Cite

@article{arxiv.2006.04497,
  title  = {Learning under Invariable Bayesian Safety},
  author = {Gal Bahar and Omer Ben-Porat and Kevin Leyton-Brown and Moshe Tennenholtz},
  journal= {arXiv preprint arXiv:2006.04497},
  year   = {2020}
}

Learning under Invariable Bayesian Safety

Abstract

Keywords

Cite

Related papers