Related papers: Semi-supervised Batch Learning From Logged Data

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Zero-Shot Off-Policy Learning

Off-policy learning methods seek to derive an optimal policy directly from a fixed dataset of prior interactions. This objective presents significant challenges, primarily due to the inherent distributional shift and value function…

Machine Learning · Computer Science 2026-02-03 Arip Asadulaev , Maksim Bobrin , Salem Lahlou , Dmitry Dylov , Fakhri Karray , Martin Takac

Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy

When learning from a batch of logged bandit feedback, the discrepancy between the policy to be learned and the off-policy training data imposes statistical and computational challenges. Unlike classical supervised learning and online…

Machine Learning · Computer Science 2018-08-02 Yuan Xie , Boyi Liu , Qiang Liu , Zhaoran Wang , Yuan Zhou , Jian Peng

Boosted Off-Policy Learning

We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze…

Machine Learning · Computer Science 2023-05-03 Ben London , Levi Lu , Ted Sandler , Thorsten Joachims

Off-Policy Evaluation from Logged Human Feedback

Learning from human feedback has been central to recent advances in artificial intelligence and machine learning. Since the collection of human feedback is costly, a natural question to ask is if the new feedback always needs to collected.…

Machine Learning · Computer Science 2024-06-17 Aniruddha Bhargava , Lalit Jain , Branislav Kveton , Ge Liu , Subhojyoti Mukherjee

Uncertainty-Aware Instance Reweighting for Off-Policy Learning

Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various real-world applications, such as search engines, recommender systems, and etc. While the…

Machine Learning · Computer Science 2023-09-28 Xiaoying Zhang , Junpu Chen , Hongning Wang , Hong Xie , Yang Liu , John C. S. Lui , Hang Li

Off-policy Learning for Multiple Loggers

It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is…

Machine Learning · Statistics 2019-08-06 Li He , Long Xia , Wei Zeng , Zhi-Ming Ma , Yihong Zhao , Dawei Yin

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are…

Machine Learning · Computer Science 2024-10-23 Matej Cief , Branislav Kveton , Michal Kompan

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value…

Econometrics · Economics 2019-07-23 Mert Demirer , Vasilis Syrgkanis , Greg Lewis , Victor Chernozhukov

Non-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to…

Machine Learning · Computer Science 2021-04-06 Joey Hong , Branislav Kveton , Manzil Zaheer , Yinlam Chow , Amr Ahmed

Off-Policy Imitation Learning from Observations

Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit through the reuse of incomplete resources. Compared to conventional imitation learning (IL), LfO is more challenging…

Machine Learning · Computer Science 2021-03-01 Zhuangdi Zhu , Kaixiang Lin , Bo Dai , Jiayu Zhou

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a…

Machine Learning · Computer Science 2015-05-22 Adith Swaminathan , Thorsten Joachims

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment. These are critical shortcomings for applying RL to real-world problems where…

Machine Learning · Computer Science 2019-07-09 Natasha Jaques , Asma Ghandeharioun , Judy Hanwen Shen , Craig Ferguson , Agata Lapedriza , Noah Jones , Shixiang Gu , Rosalind Picard

Chaining Value Functions for Off-Policy Learning

To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience. This is important to learn counterfactuals, or…

Machine Learning · Computer Science 2022-02-03 Simon Schmitt , John Shawe-Taylor , Hado van Hasselt

A Survey on Semi-Supervised Learning Techniques

Semisupervised learning is a learning standard which deals with the study of how computers and natural systems such as human beings acquire knowledge in the presence of both labeled and unlabeled data. Semisupervised learning based methods…

Machine Learning · Computer Science 2014-02-20 V. Jothi Prakash , Dr. L. M. Nithya

Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques…

Machine Learning · Computer Science 2011-09-12 N. V. Chawla , Grigoris Karakoulas

Batch Policy Learning under Constraints

When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We…

Machine Learning · Computer Science 2019-03-22 Hoang M. Le , Cameron Voloshin , Yisong Yue

Off-Policy Evaluation in Partially Observable Environments

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large…

Machine Learning · Computer Science 2019-11-26 Guy Tennenholtz , Shie Mannor , Uri Shalit

Towards Robust Off-policy Learning for Runtime Uncertainty

Off-policy learning plays a pivotal role in optimizing and evaluating policies prior to the online deployment. However, during the real-time serving, we observe varieties of interventions and constraints that cause inconsistency between the…

Machine Learning · Computer Science 2022-03-01 Da Xu , Yuting Ye , Chuanwei Ruan , Bo Yang

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains

Reinforcement learning algorithms have had tremendous successes in online learning settings. However, these successes have relied on low-stakes interactions between the algorithmic agent and its environment. In many settings where RL could…

Machine Learning · Computer Science 2020-06-05 James Bannon , Brad Windsor , Wenbo Song , Tao Li