English
Related papers

Related papers: Distillation Policy Optimization

200 papers

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on…

Machine Learning · Computer Science 2015-03-20 Thomas Degris , Martha White , Richard S. Sutton

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while…

Machine Learning · Computer Science 2021-11-02 James Queeney , Ioannis Ch. Paschalidis , Christos G. Cassandras

Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems. We propose a method for…

Machine Learning · Computer Science 2024-02-13 Nikhil Kumar Singh , Indranil Saha

Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues…

Machine Learning · Computer Science 2020-09-15 Daoming Lyu , Qi Qi , Mohammad Ghavamzadeh , Hengshuai Yao , Tianbao Yang , Bo Liu

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled:…

Machine Learning · Computer Science 2021-12-30 Daoming Lyu , Bo Liu , Matthieu Geist , Wen Dong , Saad Biaz , Qi Wang

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Improving the sample efficiency of reinforcement learning algorithms requires effective exploration. Following the principle of $\textit{optimism in the face of uncertainty}$ (OFU), we train a separate exploration policy to maximize the…

Machine Learning · Computer Science 2022-11-23 Jiachen Li , Shuo Cheng , Zhenyu Liao , Huayan Wang , William Yang Wang , Qinxun Bai

We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability…

Machine Learning · Computer Science 2019-11-19 Simon Schmitt , Matteo Hessel , Karen Simonyan

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle…

Machine Learning · Computer Science 2022-10-12 Rujie Zhong , Duohan Zhang , Lukas Schäfer , Stefano V. Albrecht , Josiah P. Hanna

Policy gradient methods are widely adopted reinforcement learning algorithms for tasks with continuous action spaces. These methods succeeded in many application domains, however, because of their notorious sample inefficiency their use…

Machine Learning · Statistics 2024-02-20 Davide Mambelli , Stephan Bongers , Onno Zoeter , Matthijs T. J. Spaan , Frans A. Oliehoek

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for…

Machine Learning · Computer Science 2018-10-16 Ilya Kostrikov , Kumar Krishna Agrawal , Debidatta Dwibedi , Sergey Levine , Jonathan Tompson

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational…

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman…

Machine Learning · Computer Science 2021-10-07 Ting-Han Fan , Peter J. Ramadge

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

Machine Learning · Statistics 2016-07-19 S. A. Murphy , Y. Deng , E. B. Laber , H. R. Maei , R. S. Sutton , K. Witkiewitz

Policy iteration is one of the classical frameworks of reinforcement learning, which requires a known initial stabilizing control. However, finding the initial stabilizing control depends on the known system model. To relax this requirement…

Systems and Control · Electrical Eng. & Systems 2025-03-20 Dongdong Li , Jiuxiang Dong

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to…

Machine Learning · Computer Science 2017-06-02 Shixiang Gu , Timothy Lillicrap , Zoubin Ghahramani , Richard E. Turner , Bernhard Schölkopf , Sergey Levine
‹ Prev 1 2 3 10 Next ›