Smooth Sequential Optimisation with Delayed Feedback

Srivas Chennu; Jamie Martin; Puli Liyanagama; Phil Mohr

Smooth Sequential Optimisation with Delayed Feedback

Machine Learning 2021-06-23 v2

Authors: Srivas Chennu , Jamie Martin , Puli Liyanagama , Phil Mohr

Abstract

Stochastic delays in feedback lead to unstable sequential learning using multi-armed bandits. Recently, empirical Bayesian shrinkage has been shown to improve reward estimation in bandit learning. Here, we propose a novel adaptation to shrinkage that estimates smoothed reward estimates from windowed cumulative inputs, to deal with incomplete knowledge from delayed feedback and non-stationary rewards. Using numerical simulations, we show that this adaptation retains the benefits of shrinkage, and improves the stability of reward estimation by more than 50%. Our proposal reduces variability in treatment allocations to the best arm by up to 3.8x, and improves statistical accuracy - with up to 8% improvement in true positive rates and 37% reduction in false positive rates. Together, these advantages enable control of the trade-off between speed and stability of adaptation, and facilitate human-in-the-loop sequential optimisation.

Keywords

multi-armed bandit contextual bandits randomized algorithm

Cite

@article{arxiv.2106.11294,
  title  = {Smooth Sequential Optimisation with Delayed Feedback},
  author = {Srivas Chennu and Jamie Martin and Puli Liyanagama and Phil Mohr},
  journal= {arXiv preprint arXiv:2106.11294},
  year   = {2021}
}

Comments

Workshop on Bayesian causal inference for real world interactive systems, 27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)

Smooth Sequential Optimisation with Delayed Feedback

Abstract

Keywords

Cite

Comments

Related papers