Related papers: Contextual Semibandits via Supervised Learning Ora…

From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size $m$ of a collection of $K$ possible actions. In each round, the learner observes the realized reward of the predicted…

Machine Learning · Computer Science 2026-02-24 Liad Erez , Tomer Koren

Efficient Algorithms for Adversarial Contextual Learning

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the…

Machine Learning · Computer Science 2016-02-09 Vasilis Syrgkanis , Akshay Krishnamurthy , Robert E. Schapire

Oracle-Efficient Combinatorial Semi-Bandits

We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by…

Machine Learning · Statistics 2025-10-27 Jung-hun Kim , Milan Vojnović , Min-hwan Oh

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one…

Machine Learning · Computer Science 2021-11-25 Aadirupa Saha , Akshay Krishnamurthy

Combinatorial Bandits with Relative Feedback

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute. Specifically, we study two regret minimisation…

Machine Learning · Computer Science 2020-02-28 Aadirupa Saha , Aditya Gopalan

Individually Fair Learning with One-Sided Feedback

We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances. On each round, $k$ instances arrive and receive classification outcomes…

Machine Learning · Computer Science 2022-06-10 Yahav Bechavod , Aaron Roth

Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles

Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur…

Machine Learning · Computer Science 2021-06-14 Sanath Kumar Krishnamurthy , Vitor Hadad , Susan Athey

Contextual Bandits with Cross-learning

In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to…

Machine Learning · Computer Science 2021-11-17 Santiago Balseiro , Negin Golrezaei , Mohammad Mahdian , Vahab Mirrokni , Jon Schneider

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access…

Machine Learning · Computer Science 2014-10-15 Alekh Agarwal , Daniel Hsu , Satyen Kale , John Langford , Lihong Li , Robert E. Schapire

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and…

Machine Learning · Computer Science 2023-07-25 Ayush Sekhari , Karthik Sridharan , Wen Sun , Runzhe Wu

Semiparametric Contextual Bandits

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear…

Machine Learning · Statistics 2018-07-17 Akshay Krishnamurthy , Zhiwei Steven Wu , Vasilis Syrgkanis

Contextual Bandit Learning with Predictable Rewards

Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a…

Machine Learning · Computer Science 2012-03-05 Alekh Agarwal , Miroslav Dudík , Satyen Kale , John Langford , Robert E. Schapire

Kernelized Offline Contextual Dueling Bandits

Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of…

Machine Learning · Computer Science 2023-07-24 Viraj Mehta , Ojash Neopane , Vikramjeet Das , Sen Lin , Jeff Schneider , Willie Neiswanger

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not…

Machine Learning · Computer Science 2020-10-27 Baihan Lin

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…

Machine Learning · Computer Science 2011-06-17 Miroslav Dudik , Daniel Hsu , Satyen Kale , Nikos Karampatziakis , John Langford , Lev Reyzin , Tong Zhang

Bandits with Partially Observable Confounded Data

We study linear contextual bandits with access to a large, confounded, offline dataset that was sampled from some fixed policy. We show that this problem is closely related to a variant of the bandit problem with side information. We…

Machine Learning · Computer Science 2021-08-11 Guy Tennenholtz , Uri Shalit , Shie Mannor , Yonathan Efroni

Contextual Bandit Algorithms with Supervised Learning Guarantees

We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm…

Machine Learning · Computer Science 2011-10-28 Alina Beygelzimer , John Langford , Lihong Li , Lev Reyzin , Robert E. Schapire

A Contextual Bandit Bake-off

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these…

Machine Learning · Statistics 2021-06-08 Alberto Bietti , Alekh Agarwal , John Langford

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on…

Machine Learning · Computer Science 2020-06-24 Dylan J. Foster , Alexander Rakhlin