English
Related papers

Related papers: Case-based off-policy policy evaluation using prot…

200 papers

We consider the problem of off-policy evaluation in Markov decision processes. Off-policy evaluation is the task of evaluating the expected return of one policy with data generated by a different, behavior policy. Importance sampling is a…

Machine Learning · Computer Science 2019-05-13 Josiah P. Hanna , Scott Niekum , Peter Stone

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased…

Machine Learning · Computer Science 2018-10-31 Qiang Liu , Lihong Li , Ziyang Tang , Dengyong Zhou

Importance sampling (IS) represents a fundamental technique for a large surge of off-policy reinforcement learning approaches. Policy gradient (PG) methods, in particular, significantly benefit from IS, enabling the effective reuse of…

Machine Learning · Computer Science 2024-05-10 Matteo Papini , Giorgio Manganini , Alberto Maria Metelli , Marcello Restelli

Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning. While it is consistent and unbiased, it can result in high variance updates to the weights for the value function. In this work,…

Machine Learning · Computer Science 2019-11-15 Matthew Schlegel , Wesley Chung , Daniel Graves , Jian Qian , Martha White

Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories in the replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts to…

Machine Learning · Computer Science 2025-05-20 Chengyang Ying , Zhongkai Hao , Xinning Zhou , Hang Su , Dong Yan , Jun Zhu

Off-policy learning exhibits greater instability when compared to on-policy learning in reinforcement learning (RL). The difference in probability distribution between the target policy ($\pi$) and the behavior policy (b) is a major cause…

A central challenge to applying many off-policy reinforcement learning algorithms to real world problems is the variance introduced by importance sampling. In off-policy learning, the agent learns about a different policy than the one being…

Machine Learning · Computer Science 2022-06-20 Eric Graves , Sina Ghiassian

Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various real-world applications, such as search engines, recommender systems, and etc. While the…

Machine Learning · Computer Science 2023-09-28 Xiaoying Zhang , Junpu Chen , Hongning Wang , Hong Xie , Yang Liu , John C. S. Lui , Hang Li

Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for…

Artificial Intelligence · Computer Science 2017-12-07 Zhaohan Daniel Guo , Philip S. Thomas , Emma Brunskill

Off-policy policy estimators that use importance sampling (IS) can suffer from high variance in long-horizon domains, and there has been particular excitement over new IS methods that leverage the structure of Markov decision processes. We…

Machine Learning · Computer Science 2020-06-09 Yao Liu , Pierre-Luc Bacon , Emma Brunskill

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However,…

Machine Learning · Computer Science 2023-06-28 Kristopher De Asis , Eric Graves , Richard S. Sutton

This paper studies off-policy evaluation (OPE) in reinforcement learning with a focus on behavior policy estimation for importance sampling. Prior work has shown empirically that estimating a history-dependent behavior policy can lead to…

Machine Learning · Computer Science 2025-05-29 Hongyi Zhou , Josiah P. Hanna , Jin Zhu , Ying Yang , Chengchun Shi

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI)…

Machine Learning · Statistics 2021-06-09 Chengchun Shi , Runzhe Wan , Victor Chernozhukov , Rui Song

This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving,…

Machine Learning · Computer Science 2018-11-08 Sina Ghiassian , Andrew Patterson , Martha White , Richard S. Sutton , Adam White

Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems…

Machine Learning · Computer Science 2023-07-17 Aaman Rebello , Shengpu Tang , Jenna Wiens , Sonali Parbhoo

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in…

Machine Learning · Computer Science 2022-07-05 Yao Liu , Yannis Flet-Berliac , Emma Brunskill

Recent policy optimization approaches (Schulman et al., 2015a; 2017) have achieved substantial empirical successes by constructing new proxy optimization objectives. These proxy objectives allow stable and low variance policy learning, but…

Machine Learning · Computer Science 2020-02-24 Marcin B. Tomczak , Dongho Kim , Peter Vrancx , Kee-Eung Kim

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Importance sampling (IS) is a widely used simulation method for estimating rare event probabilities. In IS, the relative variance of an estimator is the most common measure of estimator accuracy, and the focus of existing literature is on…

Statistics Theory · Mathematics 2026-01-05 Julie Choi , Peter Glynn

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of empirical studies, we demonstrate how accurate OPE is strongly…

Machine Learning · Computer Science 2018-07-11 Aniruddh Raghu , Omer Gottesman , Yao Liu , Matthieu Komorowski , Aldo Faisal , Finale Doshi-Velez , Emma Brunskill
‹ Prev 1 2 3 10 Next ›