English
Related papers

Related papers: Kernel-based methods for bandit convex optimizatio…

200 papers

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first $\tilde{O}(\sqrt{T})$-regret algorithm for this setting based on a novel…

Machine Learning · Computer Science 2016-03-16 Elad Hazan , Yuanzhi Li

We present an efficient algorithm for linear contextual bandits with adversarial losses and stochastic action sets. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without…

Machine Learning · Computer Science 2025-12-16 Tim van Erven , Jack Mayo , Julia Olkhovskaya , Chen-Yu Wei

Unlike classical control theory, such as Linear Quadratic Control (LQC), real-world control problems are highly complex. These problems often involve adversarial perturbations, bandit feedback models, and non-quadratic, adversarially chosen…

Machine Learning · Computer Science 2024-10-03 Y. Jennifer Sun , Zhou Lu

We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs. the class of disturbance-feedback-controllers, which encompasses all stabilizing…

Machine Learning · Computer Science 2020-10-30 Orestis Plevrakis , Elad Hazan

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is $\widetilde\Theta(\sqrt{T})$ and partially resolve a decade-old open problem. Our…

Machine Learning · Computer Science 2015-02-24 Sébastien Bubeck , Ofer Dekel , Tomer Koren , Yuval Peres

We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let $n,T,\bar{d}$ denote the dimensionality, time horizon, and average delay, respectively.…

Machine Learning · Computer Science 2024-06-25 Yuanyu Wan , Chang Yao , Mingli Song , Lijun Zhang

We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on…

Machine Learning · Computer Science 2019-10-09 Dan Garber , Ben Kretzu

Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When…

Machine Learning · Computer Science 2021-03-11 Thomas Kerdreux , Christophe Roux , Alexandre d'Aspremont , Sebastian Pokutta

We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i.e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is…

Machine Learning · Computer Science 2021-02-16 Aadirupa Saha , Nagarajan Natarajan , Praneeth Netrapalli , Prateek Jain

This paper studies bandit convex optimization in non-stationary environments with two-point feedback, using dynamic regret as the performance measure. We propose an algorithm based on bandit mirror descent that extends naturally to…

Optimization and Control · Mathematics 2026-05-26 Chang He , Bo Jiang , Shuzhong Zhang

This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be…

Optimization and Control · Mathematics 2020-05-19 Hao Yu , Michael J. Neely

We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our approach introduces a continuous-time model under…

Machine Learning · Computer Science 2026-02-04 Alexander Ryabchenko , Idan Attias , Daniel M. Roy

We show that a kernel estimator using multiple function evaluations can be easily converted into a sampling-based bandit estimator with expectation equal to the original kernel estimate. Plugging such a bandit estimator into the standard…

Machine Learning · Computer Science 2023-06-27 David Young , Douglas Leith , George Iosifidis

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the…

Optimization and Control · Mathematics 2024-06-11 Hidde Fokkema , Dirk van der Hoeven , Tor Lattimore , Jack J. Mayo

We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret…

Machine Learning · Computer Science 2023-10-19 Haolin Liu , Chen-Yu Wei , Julian Zimmert

We study the adversarial kernel bandit problem, in which the loss at each round is induced by an arbitrary bounded element of a reproducing kernel Hilbert space (RKHS). We propose an exponential-weights algorithm built on a regularized…

Machine Learning · Computer Science 2026-05-27 Yu-Jie Zhang , Hao Qiu , Jonathan Scarlett , Kevin Jamieson

We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes…

Optimization and Control · Mathematics 2020-05-05 Tatiana Tatarenko , Maryam Kamgarpour

We consider the problem of combining and learning over a set of adversarial bandit algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al.,…

Machine Learning · Computer Science 2022-02-15 Haipeng Luo , Mengxiao Zhang , Peng Zhao , Zhi-Hua Zhou

A new algorithm for regret minimization in online convex optimization is described. The regret of the algorithm after $T$ time periods is $O(\sqrt{T \log T})$ - which is the minimum possible up to a logarithmic term. In addition, the new…

Machine Learning · Computer Science 2023-07-24 Elad Hazan , Nimrod Megiddo

We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1.5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$…

Machine Learning · Computer Science 2023-02-13 Tor Lattimore , András György
‹ Prev 1 2 3 10 Next ›