English
Related papers

Related papers: Off-Policy Actor-Critic

200 papers

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy…

Machine Learning · Computer Science 2019-11-20 Wesley Suttle , Zhuoran Yang , Kaiqing Zhang , Zhaoran Wang , Tamer Basar , Ji Liu

Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence…

Machine Learning · Computer Science 2019-06-21 Ehsan Imani , Eric Graves , Martha White

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and…

Machine Learning · Computer Science 2018-08-10 Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , Sergey Levine

We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability…

Machine Learning · Computer Science 2019-11-19 Simon Schmitt , Matteo Hessel , Karen Simonyan

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the…

Machine Learning · Computer Science 2020-11-03 Wei Zhou , Yiying Li , Yongxin Yang , Huaimin Wang , Timothy M. Hospedales

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update…

Machine Learning · Computer Science 2019-03-25 Yan Zhang , Michael M. Zavlanos

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

Machine Learning · Statistics 2016-07-19 S. A. Murphy , Y. Deng , E. B. Laber , H. R. Maei , R. S. Sutton , K. Witkiewitz

Model-free deep reinforcement learning has achieved great success in many domains, such as video games, recommendation systems and robotic control tasks. In continuous control tasks, widely used policies with Gaussian distributions results…

Machine Learning · Computer Science 2023-06-05 Lingwei Peng , Hui Qian , Zhebang Shen , Chao Zhang , Fei Li

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

While on-policy algorithms are known for their stability, they often demand a substantial number of samples. In contrast, off-policy algorithms, which leverage past experiences, are considered sample-efficient but tend to exhibit…

Machine Learning · Computer Science 2023-09-28 Jianfei Ma

Actor-critic algorithms learn an explicit policy (actor), and an accompanying value function (critic). The actor performs actions in the environment, while the critic evaluates the actor's current policy. However, despite their stability…

Artificial Intelligence · Computer Science 2019-02-08 Hélène Plisnier , Denis Steckelmacher , Diederik M. Roijers , Ann Nowé

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using…

Artificial Intelligence · Computer Science 2015-12-15 Lucas Lehnert , Doina Precup

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which…

Machine Learning · Computer Science 2019-10-29 Shangtong Zhang , Wendelin Boehmer , Shimon Whiteson

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to…

Machine Learning · Computer Science 2021-08-20 Andrea Zanette , Martin J. Wainwright , Emma Brunskill

Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally…

Machine Learning · Computer Science 2024-05-30 Yu Luo , Tianying Ji , Fuchun Sun , Jianwei Zhang , Huazhe Xu , Xianyuan Zhan

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a…

Machine Learning · Computer Science 2021-10-06 Lingwei Zhu , Toshinori Kitamura , Takamitsu Matsubara
‹ Prev 1 2 3 10 Next ›