Instance-dependent Stochastic Lipschitz bandit
Abstract
We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function over a domain using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as , or adaptive via the zooming dimension , yielding . However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of . We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension , we obtain improved adaptive rates of order strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.
Cite
@article{arxiv.2605.29748,
title = {Instance-dependent Stochastic Lipschitz bandit},
author = {Marius Potfer and Vianney Perchet},
journal= {arXiv preprint arXiv:2605.29748},
year = {2026}
}