Homestat.MLarXiv:2605.29748

Instance-dependent Stochastic Lipschitz bandit

stat.MLMachine Learning2026-05v1license

Abstract

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function ff over a domain X[0,1]d\mathcal{X} \subset [0,1]^d using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as Θ~(Td+1/d+2)\tilde{\Theta} \left ( T^{d+1/d+2}\right ), or adaptive via the zooming dimension dzd_z, yielding Θ~(Tdz+1/dz+2)\tilde{\Theta} \left ( T^{d_z+1/d_z+2}\right ). However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of ff. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of ff over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension d>0d^\star>0, we obtain improved adaptive rates of order O~(Tdz+1/max(dz,d)+2)\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right ) strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.

Cite

@article{arxiv.2605.29748,
  title  = {Instance-dependent Stochastic Lipschitz bandit},
  author = {Marius Potfer and Vianney Perchet},
  journal= {arXiv preprint arXiv:2605.29748},
  year   = {2026}
}