Related papers: Soft Options Critic

SOAC: The Soft Option Actor-Critic Architecture

The option framework has shown great promise by automatically extracting temporally-extended sub-tasks from a long-horizon task. Methods have been proposed for concurrently learning low-level intra-option policies and high-level option…

Artificial Intelligence · Computer Science 2020-06-26 Chenghao Li , Xiaoteng Ma , Chongjie Zhang , Jun Yang , Li Xia , Qianchuan Zhao

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and…

Machine Learning · Computer Science 2018-08-10 Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , Sergey Levine

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Machine Learning · Computer Science 2019-09-16 Tuomas Haarnoja , Aurick Zhou , Kristian Hartikainen , George Tucker , Sehoon Ha , Jie Tan , Vikash Kumar , Henry Zhu , Abhishek Gupta , Pieter Abbeel , Sergey Levine

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Soft Actor-Critic Algorithm with Truly-satisfied Inequality Constraint

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi

Actor-critic is implicitly biased towards high entropy optimal policies

We show that the simplest actor-critic method -- a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration -- does not merely find an optimal policy, but moreover…

Machine Learning · Computer Science 2022-03-15 Yuzheng Hu , Ziwei Ji , Matus Telgarsky

Soft Actor-Critic with Cross-Entropy Policy Optimization

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinforcement learning (RL) algorithms that is within the maximum entropy based RL framework. SAC is demonstrated to perform very well in a list of continous control tasks…

Machine Learning · Computer Science 2021-12-22 Zhenyang Shi , Surya P. N. Singh

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly…

Machine Learning · Computer Science 2018-10-25 Esther Derman , Daniel J. Mankowitz , Timothy A. Mann , Shie Mannor

The Point to Which Soft Actor-Critic Converges

Soft actor-critic is a successful successor over soft Q-learning. While lived under maximum entropy framework, their relationship is still unclear. In this paper, we prove that in the limit they converge to the same solution. This is…

Machine Learning · Computer Science 2023-05-30 Jianfei Ma

The Option-Critic Architecture

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We…

Artificial Intelligence · Computer Science 2016-12-06 Pierre-Luc Bacon , Jean Harb , Doina Precup

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm…

Machine Learning · Computer Science 2019-06-10 Patrick Nadeem Ward , Ariella Smofsky , Avishek Joey Bose

Hierarchical Soft Actor-Critic: Adversarial Exploration via Mutual Information Optimization

We describe a novel extension of soft actor-critics for hierarchical Deep Q-Networks (HDQN) architectures using mutual information metric. The proposed extension provides a suitable framework for encouraging explorations in such…

Machine Learning · Computer Science 2019-06-18 Ari Azarafrooz , John Brock

Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives

While Soft Actor-Critic (SAC) is highly effective in continuous control, its discrete counterpart (DSAC) performs poorly on challenging discrete-action domains such as Atari. Consequently, starting from DSAC, we revisit the design of…

Machine Learning · Computer Science 2026-05-13 Reza Asad , Reza Babanezhad , Sharan Vaswani

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary…

Machine Learning · Computer Science 2019-02-18 Gang Chen , Yiming Peng

Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage stochastic combinatorial optimization problems from the operations research domain. In this context, risk-sensitive…

Machine Learning · Computer Science 2024-02-16 Tobias Enders , James Harrison , Maximilian Schiffer

DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic

We present a novel extension to the family of Soft Actor-Critic (SAC) algorithms. We argue that based on the Maximum Entropy Principle, discrete SAC can be further improved via additional statistical constraints derived from a surrogate…

Machine Learning · Computer Science 2025-06-24 Dexter Neo , Tsuhan Chen

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Actor-Critics Can Achieve Optimal Sample Efficiency

Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work…

Machine Learning · Statistics 2025-05-07 Kevin Tan , Wei Fan , Yuting Wei

Better Exploration with Optimistic Actor-Critic

Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in…

Machine Learning · Statistics 2019-10-29 Kamil Ciosek , Quan Vuong , Robert Loftin , Katja Hofmann

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

Reusing previously trained models is critical in deep reinforcement learning to speed up training of new agents. However, it is unclear how to acquire new skills when objectives and constraints are in conflict with previously learned…

Machine Learning · Computer Science 2022-02-09 Jaime S. Ide , Daria Mićović , Michael J. Guarino , Kevin Alcedo , David Rosenbluth , Adrian P. Pope