Related papers: Efficient Algorithms for Adversarial Contextual Le…

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is…

Machine Learning · Computer Science 2016-06-02 Vasilis Syrgkanis , Haipeng Luo , Akshay Krishnamurthy , Robert E. Schapire

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access…

Machine Learning · Computer Science 2014-10-15 Alekh Agarwal , Daniel Hsu , Satyen Kale , John Langford , Lihong Li , Robert E. Schapire

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…

Machine Learning · Computer Science 2011-06-17 Miroslav Dudik , Daniel Hsu , Satyen Kale , Nikos Karampatziakis , John Langford , Lev Reyzin , Tong Zhang

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one…

Machine Learning · Computer Science 2021-11-25 Aadirupa Saha , Akshay Krishnamurthy

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and…

Machine Learning · Computer Science 2023-07-25 Ayush Sekhari , Karthik Sridharan , Wen Sun , Runzhe Wu

An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction

We present an efficient algorithm for linear contextual bandits with adversarial losses and stochastic action sets. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without…

Machine Learning · Computer Science 2025-12-16 Tim van Erven , Jack Mayo , Julia Olkhovskaya , Chen-Yu Wei

Contextual Bandits with Cross-learning

In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to…

Machine Learning · Computer Science 2021-11-17 Santiago Balseiro , Negin Golrezaei , Mohammad Mahdian , Vahab Mirrokni , Jon Schneider

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this…

Machine Learning · Computer Science 2016-11-07 Akshay Krishnamurthy , Alekh Agarwal , Miroslav Dudik

Group-wise oracle-efficient algorithms for online multi-group learning

We study the problem of online multi-group learning, a learning model in which an online learner must simultaneously achieve small prediction regret on a large collection of (possibly overlapping) subsequences corresponding to a family of…

Machine Learning · Computer Science 2025-07-16 Samuel Deng , Daniel Hsu , Jingwen Liu

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and…

Machine Learning · Computer Science 2022-07-14 Yinglun Zhu , Paul Mineiro

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We wish to learn a hidden $d$-dimensional value $w^*$. Every round, we are presented with a…

Machine Learning · Computer Science 2021-06-10 Sreenivas Gollapudi , Guru Guruganesh , Kostas Kollias , Pasin Manurangsi , Renato Paes Leme , Jon Schneider

Context-lumpable stochastic bandits

We consider a contextual bandit problem with $S$ contexts and $K$ actions. In each round $t=1,2,\dots$, the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward…

Machine Learning · Computer Science 2023-11-29 Chung-Wei Lee , Qinghua Liu , Yasin Abbasi-Yadkori , Chi Jin , Tor Lattimore , Csaba Szepesvári

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution. Existing methods for this problem…

Machine Learning · Computer Science 2023-09-06 Haolin Liu , Chen-Yu Wei , Julian Zimmert

Contextual Bandit Learning with Predictable Rewards

Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a…

Machine Learning · Computer Science 2012-03-05 Alekh Agarwal , Miroslav Dudík , Satyen Kale , John Langford , Robert E. Schapire

Adversarial Contextual Bandits Go Kernelized

We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex…

Machine Learning · Statistics 2023-10-04 Gergely Neu , Julia Olkhovskaya , Sattar Vakili

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit…

Machine Learning · Computer Science 2019-04-05 Haipeng Luo , Chen-Yu Wei , Alekh Agarwal , John Langford

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: \emph{contexts} and \emph{side observations}. In this setting, a…

Machine Learning · Computer Science 2021-02-18 Lingda Wang , Bingcong Li , Huozhi Zhou , Georgios B. Giannakis , Lav R. Varshney , Zhizhen Zhao

Oracle-Efficient Combinatorial Semi-Bandits

We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by…

Machine Learning · Statistics 2025-10-27 Jung-hun Kim , Milan Vojnović , Min-hwan Oh

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed…

Machine Learning · Statistics 2020-10-07 Niladri S. Chatterji , Vidya Muthukumar , Peter L. Bartlett

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu