English
Related papers

Related papers: Learning from Logged Implicit Exploration Data

200 papers

We address policy learning with logged data in contextual bandits. Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring the logging policy to have \emph{full support} i.e. a…

Machine Learning · Statistics 2021-07-27 Hung Tran-The , Sunil Gupta , Thanh Nguyen-Tang , Santu Rana , Svetha Venkatesh

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the…

Machine Learning · Statistics 2023-05-30 Otmane Sakhi , Pierre Alquier , Nicolas Chopin

We consider active learning with logged data, where labeled examples are drawn conditioned on a predetermined logging policy, and the goal is to learn a classifier on the entire population, not just conditioned on the logging policy. Prior…

Machine Learning · Computer Science 2018-06-14 Songbai Yan , Kamalika Chaudhuri , Tara Javidi

Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework,…

As the adoption of federated learning increases for learning from sensitive data local to user devices, it is natural to ask if the learning can be done using implicit signals generated as users interact with the applications of interest,…

Machine Learning · Computer Science 2023-03-21 Alekh Agarwal , H. Brendan McMahan , Zheng Xu

Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects.…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Mahmudul Hasan , Sujoy Paul , Anastasios I. Mourikis , Amit K. Roy-Chowdhury

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy…

Machine Learning · Computer Science 2021-07-26 Andrea Zanette , Kefan Dong , Jonathan Lee , Emma Brunskill

Learning effective contextual-bandit policies from past actions of a deployed system is highly desirable in many settings (e.g. voice assistants, recommendation, search), since it enables the reuse of large amounts of log data.…

Machine Learning · Computer Science 2020-06-18 Noveen Sachdeva , Yi Su , Thorsten Joachims

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. \emph{Offline} evaluation of the effectiveness of new algorithms in these applications is…

Machine Learning · Computer Science 2015-03-13 Lihong Li , Wei Chu , John Langford , Xuanhui Wang

Off-policy evaluation and learning in contextual bandits use logged interaction data to estimate and optimize the value of a target policy. Most existing methods require sufficient action overlap between the logging and target policies, and…

Machine Learning · Statistics 2026-02-03 Shu Tamano

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to…

Methodology · Statistics 2024-08-19 Ian Waudby-Smith , Lili Wu , Aaditya Ramdas , Nikos Karampatziakis , Paul Mineiro

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence…

Machine Learning · Computer Science 2020-10-20 Nikos Karampatziakis , John Langford , Paul Mineiro

We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze…

Machine Learning · Computer Science 2023-05-03 Ben London , Levi Lu , Ted Sandler , Thorsten Joachims

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to…

Machine Learning · Computer Science 2021-04-06 Joey Hong , Branislav Kveton , Manzil Zaheer , Yinlam Chow , Amr Ahmed

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual…

Machine Learning · Computer Science 2024-02-14 Mengxiao Zhang , Yuheng Zhang , Haipeng Luo , Paul Mineiro

Although exploratory behaviors are ubiquitous in the animal kingdom, their computational underpinnings are still largely unknown. Behavioral Psychology has identified learning as a primary drive underlying many exploratory behaviors.…

Machine Learning · Computer Science 2011-12-14 Daniel Y. Little , Friedrich T. Sommer

We describe MELEE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting. Here, an algorithm must take actions based on contexts, and learn based only on a reward signal from the…

Machine Learning · Computer Science 2019-01-25 Amr Sharaf , Hal Daumé

We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance…

Machine Learning · Computer Science 2012-10-19 Miroslav Dudik , Dumitru Erhan , John Langford , Lihong Li

Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many…

Robotics · Computer Science 2023-05-16 Maximilian Du , Suraj Nair , Dorsa Sadigh , Chelsea Finn
‹ Prev 1 2 3 10 Next ›