Related papers: Divergence-Augmented Policy Optimization

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to…

Machine Learning · Computer Science 2017-06-02 Shixiang Gu , Timothy Lillicrap , Zoubin Ghahramani , Richard E. Turner , Bernhard Schölkopf , Sergey Levine

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently…

Machine Learning · Computer Science 2019-05-15 Andreas Doerr , Michael Volpp , Marc Toussaint , Sebastian Trimpe , Christian Daniel

Off-Policy Policy Gradient with State Distribution Correction

We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method. Prior off-policy policy gradient approaches have generally ignored the mismatch between the…

Machine Learning · Computer Science 2019-07-09 Yao Liu , Adith Swaminathan , Alekh Agarwal , Emma Brunskill

Distributionally-Constrained Policy Optimization via Unbalanced Optimal Transport

We consider constrained policy optimization in Reinforcement Learning, where the constraints are in form of marginals on state visitations and global action executions. Given these distributions, we formulate policy optimization as…

Machine Learning · Computer Science 2021-02-17 Arash Givchi , Pei Wang , Junqi Wang , Patrick Shafto

On the connection between Bregman divergence and value in regularized Markov decision processes

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has…

Machine Learning · Computer Science 2022-11-08 Brendan O'Donoghue

Automatic Reward Shaping from Confounded Offline Data

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-09-10 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Adaptive Policy Learning to Additional Tasks

This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's…

Machine Learning · Computer Science 2025-09-29 Wenjian Hao , Zehui Lu , Zihao Liang , Tianyu Zhou , Shaoshuai Mou

On- and Off-Policy Monotonic Policy Improvement

Monotonic policy improvement and off-policy learning are two main desirable properties for reinforcement learning algorithms. In this paper, by lower bounding the performance difference of two policies, we show that the monotonic policy…

Artificial Intelligence · Computer Science 2017-11-02 Ryo Iwaki , Minoru Asada

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

In order for reinforcement learning techniques to be useful in real-world decision making processes, they must be able to produce robust performance from limited data. Deep policy optimization methods have achieved impressive results on…

Machine Learning · Computer Science 2020-12-22 James Queeney , Ioannis Ch. Paschalidis , Christos G. Cassandras

Bi-Level Offline Policy Optimization with Limited Exploration

We study offline reinforcement learning (RL) which seeks to learn a good policy based on a fixed, pre-collected dataset. A fundamental challenge behind this task is the distributional shift due to the dataset lacking sufficient exploration,…

Machine Learning · Computer Science 2023-10-11 Wenzhuo Zhou

Bregman Gradient Policy Optimization

In the paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm…

Machine Learning · Computer Science 2022-03-17 Feihu Huang , Shangqian Gao , Heng Huang

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a…

Machine Learning · Computer Science 2020-06-22 Ahmed Touati , Amy Zhang , Joelle Pineau , Pascal Vincent

Confounding Robust Deep Reinforcement Learning: A Causal Approach

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-10-27 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Statistically Efficient Off-Policy Policy Gradients

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from…

Machine Learning · Statistics 2020-02-21 Nathan Kallus , Masatoshi Uehara

Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients

Policy iteration is one of the classical frameworks of reinforcement learning, which requires a known initial stabilizing control. However, finding the initial stabilizing control depends on the known system model. To relax this requirement…

Systems and Control · Electrical Eng. & Systems 2025-03-20 Dongdong Li , Jiuxiang Dong

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data…

Machine Learning · Computer Science 2022-06-16 Raghuram Bharadwaj Diddigi , Prateek Jain , Prabuchandran K. J. , Shalabh Bhatnagar

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues…

Machine Learning · Computer Science 2020-09-15 Daoming Lyu , Qi Qi , Mohammad Ghavamzadeh , Hengshuai Yao , Tianbao Yang , Bo Liu

Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Policy-based methods have achieved remarkable success in solving challenging reinforcement learning problems. Among these methods, off-policy policy gradient methods are particularly important due to that they can benefit from off-policy…

Machine Learning · Computer Science 2024-05-07 Wenjia Meng , Qian Zheng , Long Yang , Yilong Yin , Gang Pan