Related papers: A Batch, Off-Policy, Actor-Critic Algorithm for Op…

Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on…

Machine Learning · Computer Science 2015-03-20 Thomas Degris , Martha White , Richard S. Sutton

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update…

Machine Learning · Computer Science 2019-03-25 Yan Zhang , Michael M. Zavlanos

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy…

Machine Learning · Computer Science 2019-11-20 Wesley Suttle , Zhuoran Yang , Kaiqing Zhang , Zhaoran Wang , Tamer Basar , Ji Liu

Revisiting stochastic off-policy action-value gradients

Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value…

Machine Learning · Statistics 2017-03-14 Yemi Okesanjo , Victor Kofia

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman…

Machine Learning · Computer Science 2021-10-07 Ting-Han Fan , Peter J. Ramadge

Optimal Actor-Critic Policy with Optimized Training Datasets

Actor-critic (AC) algorithms are known for their efficacy and high performance in solving reinforcement learning problems, but they also suffer from low sampling efficiency. An AC based policy optimization process is iterative and needs to…

Machine Learning · Computer Science 2021-12-02 Chayan Banerjee , Zhiyong Chen , Nasimul Noman , Mohsen Zamani

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic…

Machine Learning · Computer Science 2023-07-20 Naman Saxena , Subhojyoti Khastigir , Shishir Kolathaya , Shalabh Bhatnagar

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data…

Machine Learning · Computer Science 2022-06-16 Raghuram Bharadwaj Diddigi , Prateek Jain , Prabuchandran K. J. , Shalabh Bhatnagar

Variance Penalized On-Policy and Off-Policy Actor-Critic

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Batch Policy Learning in Average Reward Markov Decision Processes

We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a…

Statistics Theory · Mathematics 2022-09-20 Peng Liao , Zhengling Qi , Runzhe Wan , Predrag Klasnja , Susan Murphy

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Off-Policy Actor-Critic with Shared Experience Replay

We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability…

Machine Learning · Computer Science 2019-11-19 Simon Schmitt , Matteo Hessel , Karen Simonyan

Distillation Policy Optimization

While on-policy algorithms are known for their stability, they often demand a substantial number of samples. In contrast, off-policy algorithms, which leverage past experiences, are considered sample-efficient but tend to exhibit…

Machine Learning · Computer Science 2023-09-28 Jianfei Ma

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Improving the sample efficiency of reinforcement learning algorithms requires effective exploration. Following the principle of $\textit{optimism in the face of uncertainty}$ (OFU), we train a separate exploration policy to maximize the…

Machine Learning · Computer Science 2022-11-23 Jiachen Li , Shuo Cheng , Zhenyu Liao , Huayan Wang , William Yang Wang , Qinxun Bai

Reliable Off-policy Evaluation for Reinforcement Learning

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy.…

Machine Learning · Computer Science 2022-11-04 Jie Wang , Rui Gao , Hongyuan Zha

Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

This study proposes the use of a social learning method to estimate a global state within a multi-agent off-policy actor-critic algorithm for reinforcement learning (RL) operating in a partially observable environment. We assume that the…

Machine Learning · Computer Science 2024-07-09 Ainur Zhaikhan , Ali H. Sayed

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in…

Machine Learning · Computer Science 2019-05-29 Shariq Iqbal , Fei Sha

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence…

Machine Learning · Computer Science 2019-06-21 Ehsan Imani , Eric Graves , Martha White