Related papers: Variance-Aware Off-Policy Evaluation with Linear F…

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In…

Machine Learning · Computer Science 2019-12-16 Aurélien F. Bibaut , Ivana Malenica , Nikos Vlassis , Mark J. van der Laan

Conformal Off-Policy Evaluation in Markov Decision Processes

Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when…

Machine Learning · Computer Science 2024-07-02 Daniele Foffano , Alessio Russo , Alexandre Proutiere

Projected State-action Balancing Weights for Offline Reinforcement Learning

Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly…

Machine Learning · Computer Science 2022-06-13 Jiayi Wang , Zhengling Qi , Raymond K. W. Wong

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can…

Machine Learning · Statistics 2023-02-03 Yang Xu , Jin Zhu , Chengchun Shi , Shikai Luo , Rui Song

Off-Policy Evaluation via Off-Policy Classification

In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments. Typically, the performance of deep RL algorithms is evaluated via on-policy interactions with the target environment.…

Machine Learning · Computer Science 2019-11-26 Alex Irpan , Kanishka Rao , Konstantinos Bousmalis , Chris Harris , Julian Ibarz , Sergey Levine

Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Motivated by the many real-world applications of reinforcement learning (RL) that require safe-policy iterations, we consider the problem of off-policy evaluation (OPE) -- the problem of evaluating a new policy using the historical data…

Machine Learning · Computer Science 2020-04-02 Tengyang Xie , Yifei Ma , Yu-Xiang Wang

$\Delta\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning…

Machine Learning · Computer Science 2024-09-17 Olivier Jeunen , Aleksei Ustimenko

On Instrumental Variable Regression for Deep Offline Policy Evaluation

We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being…

Machine Learning · Computer Science 2022-12-01 Yutian Chen , Liyuan Xu , Caglar Gulcehre , Tom Le Paine , Arthur Gretton , Nando de Freitas , Arnaud Doucet

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decision-making strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators…

Machine Learning · Computer Science 2022-03-14 Ming Yin , Yaqi Duan , Mengdi Wang , Yu-Xiang Wang

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

A recently popular approach to solving reinforcement learning is with data from human preferences. In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve…

Machine Learning · Computer Science 2024-02-28 Zihao Li , Xiang Ji , Minshuo Chen , Mengdi Wang

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy based on offline data previously collected under different policies. Therefore, OPE is a key step in applying reinforcement learning to real-world…

Machine Learning · Computer Science 2021-03-11 Yihao Feng , Ziyang Tang , Na Zhang , Qiang Liu

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings where experimentation is limited, such as education and healthcare. But, in these very same settings, observed actions are often confounded by…

Machine Learning · Computer Science 2020-07-29 Andrew Bennett , Nathan Kallus , Lihong Li , Ali Mousavi

Accountable Off-Policy Evaluation With Kernel Bellman Statistics

We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments, without requiring the execution of the new policy. This finds important applications in areas…

Machine Learning · Computer Science 2020-08-18 Yihao Feng , Tongzheng Ren , Ziyang Tang , Qiang Liu

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question. Existing approaches based…

Machine Learning · Computer Science 2021-11-04 Siyuan Zhang , Nan Jiang

Automated Off-Policy Estimator Selection via Supervised Learning

The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way…

Machine Learning · Computer Science 2024-11-12 Nicolò Felicioni , Michael Benigni , Maurizio Ferrari Dacrema

A maximum-entropy approach to off-policy evaluation in average-reward MDPs

This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known…

Machine Learning · Computer Science 2020-06-24 Nevena Lazic , Dong Yin , Mehrdad Farajtabar , Nir Levine , Dilan Gorur , Chris Harris , Dale Schuurmans

Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning

The problem of Offline Policy Evaluation (OPE) in Reinforcement Learning (RL) is a critical step towards applying RL in real-life applications. Existing work on OPE mostly focus on evaluating a fixed target policy $\pi$, which does not…

Machine Learning · Computer Science 2020-12-02 Ming Yin , Yu Bai , Yu-Xiang Wang

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

We study minimax methods for off-policy evaluation (OPE) using value functions and marginalized importance weights. Despite that they hold promises of overcoming the exponential variance in traditional importance sampling, several key…

Machine Learning · Computer Science 2020-11-06 Nan Jiang , Jiawei Huang

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically…

Machine Learning · Computer Science 2023-10-31 Brahma S. Pavse , Josiah P. Hanna

A Review of Off-Policy Evaluation in Reinforcement Learning

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of…

Machine Learning · Statistics 2022-12-14 Masatoshi Uehara , Chengchun Shi , Nathan Kallus