Related papers: Generative Actor-Critic: An Off-policy Algorithm U…

Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on…

Machine Learning · Computer Science 2015-03-20 Thomas Degris , Martha White , Richard S. Sutton

Distributional Policy Optimization: An Alternative Approach for Continuous Control

We identify a fundamental problem in policy gradient-based methods in continuous control. As policy gradient methods require the agent's underlying probability distribution, they limit policy representation to parametric distribution…

Machine Learning · Computer Science 2019-11-26 Chen Tessler , Guy Tennenholtz , Shie Mannor

Guide Actor-Critic for Continuous Control

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only…

Machine Learning · Statistics 2018-02-23 Voot Tangkaratt , Abbas Abdolmaleki , Masashi Sugiyama

Generative Actor Critic

Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online experiences. This paper introduces Generative Actor…

Machine Learning · Computer Science 2025-12-29 Aoyang Qin , Deqian Kong , Wei Wang , Ying Nian Wu , Song-Chun Zhu , Sirui Xie

Leveraging exploration in off-policy algorithms via normalizing flows

The ability to discover approximately optimal policies in domains with sparse rewards is crucial to applying reinforcement learning (RL) in many real-world scenarios. Approaches such as neural density models and continuous exploration…

Machine Learning · Computer Science 2019-09-25 Bogdan Mazoure , Thang Doan , Audrey Durand , R Devon Hjelm , Joelle Pineau

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary…

Machine Learning · Computer Science 2019-02-18 Gang Chen , Yiming Peng

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm…

Machine Learning · Computer Science 2019-06-10 Patrick Nadeem Ward , Ariella Smofsky , Avishek Joey Bose

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO…

Machine Learning · Computer Science 2022-04-25 Bobak Shahriari , Abbas Abdolmaleki , Arunkumar Byravan , Abe Friesen , Siqi Liu , Jost Tobias Springenberg , Nicolas Heess , Matt Hoffman , Martin Riedmiller

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Machine Learning · Computer Science 2019-09-16 Tuomas Haarnoja , Aurick Zhou , Kristian Hartikainen , George Tucker , Sehoon Ha , Jie Tan , Vikash Kumar , Henry Zhu , Abhishek Gupta , Pieter Abbeel , Sergey Levine

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Designing off-policy reinforcement learning algorithms is typically a very challenging task, because a desirable iteration update often involves an expectation over an on-policy distribution. Prior off-policy actor-critic (AC) algorithms…

Machine Learning · Computer Science 2021-07-20 Tengyu Xu , Zhuoran Yang , Zhaoran Wang , Yingbin Liang

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and…

Machine Learning · Computer Science 2018-08-10 Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , Sergey Levine

Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration

In this paper, sample-aware policy entropy regularization is proposed to enhance the conventional policy entropy regularization for better exploration. Exploiting the sample distribution obtainable from the replay buffer, the proposed…

Machine Learning · Computer Science 2021-06-10 Seungyul Han , Youngchul Sung

Generalized Off-Policy Actor-Critic

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which…

Machine Learning · Computer Science 2019-10-29 Shangtong Zhang , Wendelin Boehmer , Shimon Whiteson

Towards Automatic Actor-Critic Solutions to Continuous Control

Model-free off-policy actor-critic methods are an efficient solution to complex continuous control tasks. However, these algorithms rely on a number of design tricks and hyperparameters, making their application to new domains difficult and…

Machine Learning · Computer Science 2021-10-26 Jake Grigsby , Jin Yong Yoo , Yanjun Qi

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-policy actor-critic deep RL algorithm. BAC mathematically formulates the behavior of the policy through autoencoders by providing an accurate estimation of how frequently…

Machine Learning · Computer Science 2021-04-12 Ammar Fayad , Majd Ibrahim

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence…

Machine Learning · Computer Science 2019-06-21 Ehsan Imani , Eric Graves , Martha White

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally…

Machine Learning · Computer Science 2024-05-30 Yu Luo , Tianying Ji , Fuchun Sun , Jianwei Zhang , Huazhe Xu , Xianyuan Zhan

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple interactions with the environment. Ignorant random exploration may cause a waste of time and resources. It's essential to alleviate such waste. As discussed in this…

Machine Learning · Computer Science 2022-06-24 Tairan Huang , Xu Li , Hao Li , Mingming Sun , Ping Li