Related papers: Learning from Logged Implicit Exploration Data

Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

We address policy learning with logged data in contextual bandits. Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring the logging policy to have \emph{full support} i.e. a…

Machine Learning · Statistics 2021-07-27 Hung Tran-The , Sunil Gupta , Thanh Nguyen-Tang , Santu Rana , Svetha Venkatesh

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

PAC-Bayesian Offline Contextual Bandits With Guarantees

This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the…

Machine Learning · Statistics 2023-05-30 Otmane Sakhi , Pierre Alquier , Nicolas Chopin

Active Learning with Logged Data

We consider active learning with logged data, where labeled examples are drawn conditioned on a predetermined logging policy, and the goal is to learn a classifier on the entire population, not just conditioned on the logging policy. Prior…

Machine Learning · Computer Science 2018-06-14 Songbai Yan , Kamalika Chaudhuri , Tara Javidi

Semi-supervised Batch Learning From Logged Data

Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework,…

Machine Learning · Computer Science 2024-02-20 Gholamali Aminian , Armin Behnamnia , Roberto Vega , Laura Toni , Chengchun Shi , Hamid R. Rabiee , Omar Rivasplata , Miguel R. D. Rodrigues

An Empirical Evaluation of Federated Contextual Bandit Algorithms

As the adoption of federated learning increases for learning from sensitive data local to user devices, it is natural to ask if the learning can be done using implicit signals generated as users interact with the applications of interest,…

Machine Learning · Computer Science 2023-03-21 Alekh Agarwal , H. Brendan McMahan , Zheng Xu

Context-Aware Query Selection for Active Learning in Event Recognition

Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects.…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Mahmudul Hasan , Sujoy Paul , Anastasios I. Mourikis , Amit K. Roy-Chowdhury

Design of Experiments for Stochastic Contextual Linear Bandits

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy…

Machine Learning · Computer Science 2021-07-26 Andrea Zanette , Kefan Dong , Jonathan Lee , Emma Brunskill

Off-policy Bandits with Deficient Support

Learning effective contextual-bandit policies from past actions of a deployed system is highly desirable in many settings (e.g. voice assistants, recommendation, search), since it enables the reuse of large amounts of log data.…

Machine Learning · Computer Science 2020-06-18 Noveen Sachdeva , Yi Su , Thorsten Joachims

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. \emph{Offline} evaluation of the effectiveness of new algorithms in these applications is…

Machine Learning · Computer Science 2015-03-13 Lihong Li , Wei Chu , John Langford , Xuanhui Wang

DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects

Off-policy evaluation and learning in contextual bandits use logged interaction data to estimate and optimize the value of a target policy. Most existing methods require sufficient action overlap between the logging and target policies, and…

Machine Learning · Statistics 2026-02-03 Shu Tamano

Anytime-valid off-policy inference for contextual bandits

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to…

Methodology · Statistics 2024-08-19 Ian Waudby-Smith , Lili Wu , Aaditya Ramdas , Nikos Karampatziakis , Paul Mineiro

Empirical Likelihood for Contextual Bandits

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence…

Machine Learning · Computer Science 2020-10-20 Nikos Karampatziakis , John Langford , Paul Mineiro

Boosted Off-Policy Learning

We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze…

Machine Learning · Computer Science 2023-05-03 Ben London , Levi Lu , Ted Sandler , Thorsten Joachims

Non-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to…

Machine Learning · Computer Science 2021-04-06 Joey Hong , Branislav Kveton , Manzil Zaheer , Yinlam Chow , Amr Ahmed

Efficient Contextual Bandits with Uninformed Feedback Graphs

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual…

Machine Learning · Computer Science 2024-02-14 Mengxiao Zhang , Yuheng Zhang , Haipeng Luo , Paul Mineiro

Learning in embodied action-perception loops through exploration

Although exploratory behaviors are ubiquitous in the animal kingdom, their computational underpinnings are still largely unknown. Behavioral Psychology has identified learning as a primary drive underlying many exploratory behaviors.…

Machine Learning · Computer Science 2011-12-14 Daniel Y. Little , Friedrich T. Sommer

Meta-Learning for Contextual Bandit Exploration

We describe MELEE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting. Here, an algorithm must take actions based on contexts, and learn based only on a reward signal from the…

Machine Learning · Computer Science 2019-01-25 Amr Sharaf , Hal Daumé

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance…

Machine Learning · Computer Science 2012-10-19 Miroslav Dudik , Dumitru Erhan , John Langford , Lihong Li

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many…

Robotics · Computer Science 2023-05-16 Maximilian Du , Suraj Nair , Dorsa Sadigh , Chelsea Finn