English

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

Machine Learning 2023-02-21 v3 Machine Learning

Abstract

We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on the non-stationary kernel bandit setting. Moreover, when applied to the non-stationary linear bandit setting by using a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.

Keywords

Cite

@article{arxiv.2205.14775,
  title  = {An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge},
  author = {Kihyuk Hong and Yuhang Li and Ambuj Tewari},
  journal= {arXiv preprint arXiv:2205.14775},
  year   = {2023}
}