Related papers: Zero-Shot Off-Policy Learning

Non-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to…

Machine Learning · Computer Science 2021-04-06 Joey Hong , Branislav Kveton , Manzil Zaheer , Yinlam Chow , Amr Ahmed

Improving Zero-Shot Offline RL via Behavioral Task Sampling

Offline zero-shot reinforcement learning (RL) aims to learn agents that optimize unseen reward functions without additional environment interaction. The standard approach to this problem trains task-conditioned policies by sampling task…

Artificial Intelligence · Computer Science 2026-04-29 Nazim Bendib , Nicolas Perrin-Gilbert , Olivier Sigaud

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value…

Econometrics · Economics 2019-07-23 Mert Demirer , Vasilis Syrgkanis , Greg Lewis , Victor Chernozhukov

OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and…

Machine Learning · Computer Science 2022-05-24 Hana Hoshino , Kei Ota , Asako Kanezaki , Rio Yokota

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Semi-supervised Batch Learning From Logged Data

Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework,…

Machine Learning · Computer Science 2024-02-20 Gholamali Aminian , Armin Behnamnia , Roberto Vega , Laura Toni , Chengchun Shi , Hamid R. Rabiee , Omar Rivasplata , Miguel R. D. Rodrigues

Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift

Off-policy deep reinforcement learning (RL) algorithms are incapable of learning solely from batch offline data without online interactions with the environment, due to the phenomenon known as \textit{extrapolation error}. This is often due…

Machine Learning · Computer Science 2019-12-03 Riashat Islam , Komal K. Teru , Deepak Sharma , Joelle Pineau

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Improving the sample efficiency of reinforcement learning algorithms requires effective exploration. Following the principle of $\textit{optimism in the face of uncertainty}$ (OFU), we train a separate exploration policy to maximize the…

Machine Learning · Computer Science 2022-11-23 Jiachen Li , Shuo Cheng , Zhenyu Liao , Huayan Wang , William Yang Wang , Qinxun Bai

Chaining Value Functions for Off-Policy Learning

To accumulate knowledge and improve its policy of behaviour, a reinforcement learning agent can learn `off-policy' about policies that differ from the policy used to generate its experience. This is important to learn counterfactuals, or…

Machine Learning · Computer Science 2022-02-03 Simon Schmitt , John Shawe-Taylor , Hado van Hasselt

Minimax Model Learning

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution…

Machine Learning · Computer Science 2021-03-04 Cameron Voloshin , Nan Jiang , Yisong Yue

Off Policy Risk Sensitive Reinforcement Learning Based Optimal Tracking Control with Prescribe Performances

An off policy reinforcement learning based control strategy is developed for the optimal tracking control problem to achieve the prescribed performance of full states during the learning process. The optimal tracking control problem is…

Systems and Control · Electrical Eng. & Systems 2020-09-02 C. Li , Y. Wang , F. Liu , M. Buss

Distributional Successor Features Enable Zero-Shot Policy Optimization

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through…

Machine Learning · Computer Science 2025-01-22 Chuning Zhu , Xinqi Wang , Tyler Han , Simon S. Du , Abhishek Gupta

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data…

Machine Learning · Computer Science 2022-06-16 Raghuram Bharadwaj Diddigi , Prateek Jain , Prabuchandran K. J. , Shalabh Bhatnagar

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches…

Machine Learning · Computer Science 2022-12-27 Bumgeun Park , Taeyoung Kim , Woohyeon Moon , Luiz Felipe Vecchietti , Dongsoo Har

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control

Reinforcement learning (RL) is widely used for humanoid control, with on-policy methods such as Proximal Policy Optimization (PPO) enabling robust training via large-scale parallel simulation and, in some cases, zero-shot deployment to real…

Robotics · Computer Science 2026-02-24 Weidong Huang , Zhehan Li , Hangxin Liu , Biao Hou , Yao Su , Jingwen Zhang

A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning

Off-dynamics Reinforcement Learning (ODRL) seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics. In this context, traditional RL agents depend excessively on the…

Machine Learning · Computer Science 2024-07-16 Paul Daoudi , Christophe Prieur , Bogdan Robu , Merwan Barlier , Ludovic Dos Santos

Off-policy Learning with Eligibility Traces: A Survey

In the framework of Markov Decision Processes, off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly…

Artificial Intelligence · Computer Science 2013-04-16 Matthieu Geist , Bruno Scherrer

Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle…

Machine Learning · Computer Science 2022-10-12 Rujie Zhong , Duohan Zhang , Lukas Schäfer , Stefano V. Albrecht , Josiah P. Hanna