Related papers: Minimax-Optimal Off-Policy Evaluation with Linear …

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. We analyze the deep fitted Q-evaluation method for estimating the expected cumulative reward of a target policy, when the data…

Machine Learning · Computer Science 2022-10-05 Xiang Ji , Minshuo Chen , Mengdi Wang , Tuo Zhao

Minimax Model Learning

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution…

Machine Learning · Computer Science 2021-03-04 Cameron Voloshin , Nan Jiang , Yisong Yue

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

On Minimax Optimal Offline Policy Evaluation

This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax…

Artificial Intelligence · Computer Science 2014-09-15 Lihong Li , Remi Munos , Csaba Szepesvari

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood. In this paper, we study the use of bootstrapping in off-policy evaluation…

Machine Learning · Statistics 2022-05-24 Botao Hao , Xiang Ji , Yaqi Duan , Hao Lu , Csaba Szepesvári , Mengdi Wang

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical…

Machine Learning · Statistics 2022-02-11 Ruiqi Zhang , Xuezhou Zhang , Chengzhuo Ni , Mengdi Wang

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose…

Machine Learning · Computer Science 2022-01-05 Yifei Min , Tianhao Wang , Dongruo Zhou , Quanquan Gu

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its…

Machine Learning · Computer Science 2020-08-12 Omer Gottesman , Joseph Futoma , Yao Liu , Sonali Parbhoo , Leo Anthony Celi , Emma Brunskill , Finale Doshi-Velez

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning

By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include…

Machine Learning · Computer Science 2023-12-01 Jared Markowitz , Jesse Silverberg , Gary Collins

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a…

Artificial Intelligence · Computer Science 2020-12-23 Nymisha Bandi , Theja Tulabandhula

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our contributions include: (1) A new estimator, MWL, that…

Machine Learning · Computer Science 2020-10-08 Masatoshi Uehara , Jiawei Huang , Nan Jiang

Minimax Off-Policy Evaluation for Multi-Armed Bandits

We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch…

Machine Learning · Statistics 2021-01-20 Cong Ma , Banghua Zhu , Jiantao Jiao , Martin J. Wainwright

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \emph{approximation factors} --…

Machine Learning · Computer Science 2023-12-18 Philip Amortila , Nan Jiang , Csaba Szepesvári

Q($\lambda$) with Off-Policy Corrections

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of…

Artificial Intelligence · Computer Science 2016-08-12 Anna Harutyunyan , Marc G. Bellemare , Tom Stepleton , Remi Munos

On Gap-dependent Bounds for Offline Reinforcement Learning

This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior work showed when the density ratio between an optimal policy and the behavior policy is upper bounded (the optimal policy…

Machine Learning · Computer Science 2022-08-05 Xinqi Wang , Qiwen Cui , Simon S. Du

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are…

Machine Learning · Computer Science 2024-10-23 Matej Cief , Branislav Kveton , Michal Kompan

Combining Parametric and Nonparametric Models for Off-Policy Evaluation

We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value…

Machine Learning · Computer Science 2020-02-19 Omer Gottesman , Yao Liu , Scott Sussex , Emma Brunskill , Finale Doshi-Velez

Model-based Offline Reinforcement Learning with Local Misspecification

We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy…

Machine Learning · Computer Science 2023-01-30 Kefan Dong , Yannis Flet-Berliac , Allen Nie , Emma Brunskill

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In…

Machine Learning · Computer Science 2019-12-16 Aurélien F. Bibaut , Ivana Malenica , Nikos Vlassis , Mark J. van der Laan