Related papers: PAC-Bayesian Soft Actor-Critic Learning

Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion

Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in massively parallel simulation environments like IsaacLab. However, its on-policy nature makes it…

Robotics · Computer Science 2026-05-26 Gianluca Sabatini , Chenhao Li , Marco Hutter

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Machine Learning · Computer Science 2019-09-16 Tuomas Haarnoja , Aurick Zhou , Kristian Hartikainen , George Tucker , Sehoon Ha , Jie Tan , Vikash Kumar , Henry Zhu , Abhishek Gupta , Pieter Abbeel , Sergey Levine

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly…

Machine Learning · Computer Science 2018-10-25 Esther Derman , Daniel J. Mankowitz , Timothy A. Mann , Shie Mannor

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

SARC: Soft Actor Retrospective Critic

The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures…

Machine Learning · Computer Science 2023-06-30 Sukriti Verma , Ayush Chopra , Jayakumar Subramanian , Mausoom Sarkar , Nikaash Puri , Piyush Gupta , Balaji Krishnamurthy

Effective Reinforcement Learning Control using Conservative Soft Actor-Critic

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning…

Robotics · Computer Science 2026-02-25 Zhiwei Shang , Xinyi Yuan , Wenjun Huang , Yunduan Cui , Di Chen , Meixin Zhu

Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning

The Soft Actor-Critic (SAC) algorithm, a state-of-the-art method in maximum entropy reinforcement learning, traditionally relies on minimizing reverse Kullback-Leibler (KL) divergence for policy updates. However, this approach leads to an…

Machine Learning · Computer Science 2025-06-03 Yixian Zhang , Huaze Tang , Changxu Wei , Wenbo Ding

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL…

Machine Learning · Computer Science 2022-07-07 Yannis Flet-Berliac , Debabrota Basu

Soft Actor-Critic Algorithm with Truly-satisfied Inequality Constraint

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi

TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control

Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address…

Machine Learning · Computer Science 2022-05-03 Tanuja Joshi , Hariprasad Kodamana , Harikumar Kandath , Niket Kaisare

Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients

Recent advances in deep reinforcement learning have achieved impressive results in a wide range of complex tasks, but poor sample efficiency remains a major obstacle to real-world deployment. Soft actor-critic (SAC) mitigates this problem…

Machine Learning · Computer Science 2024-09-10 Luca Della Libera

Soft Actor-Critic with Cross-Entropy Policy Optimization

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinforcement learning (RL) algorithms that is within the maximum entropy based RL framework. SAC is demonstrated to perform very well in a list of continous control tasks…

Machine Learning · Computer Science 2021-12-22 Zhenyang Shi , Surya P. N. Singh

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start…

Machine Learning · Computer Science 2023-06-21 Hang Wang , Sen Lin , Junshan Zhang

Model-Based Actor-Critic with Chance Constraint for Stochastic System

Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low…

Machine Learning · Computer Science 2021-03-17 Baiyu Peng , Yao Mu , Yang Guan , Shengbo Eben Li , Yuming Yin , Jianyu Chen

Improving Actor-Critic Training with Steerable Action-Value Approximation Errors

Off-policy actor-critic algorithms have shown strong potential in deep reinforcement learning for continuous control tasks. Their success primarily comes from leveraging pessimistic state-action value function updates, which reduce function…

Machine Learning · Computer Science 2025-08-21 Bahareh Tasdighi , Nicklas Werge , Yi-Shan Wu , Melih Kandemir

PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

We derive a novel PAC-Bayesian generalization bound for reinforcement learning that explicitly accounts for Markov dependencies in the data, through the chain's mixing time. This contributes to overcoming challenges in obtaining…

Machine Learning · Computer Science 2026-02-10 Abdelkrim Zitouni , Mehdi Hennequin , Juba Agoun , Ryan Horache , Nadia Kabachi , Omar Rivasplata

Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning

We propose WSAC (Weighted Safe Actor-Critic), a novel algorithm for Safe Offline Reinforcement Learning (RL) under functional approximation, which can robustly optimize policies to improve upon an arbitrary reference policy with limited…

Machine Learning · Computer Science 2024-11-01 Honghao Wei , Xiyue Peng , Arnob Ghosh , Xin Liu

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize \textit{behavior regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty measuring divergence between the policy and the offline…

Machine Learning · Computer Science 2021-10-15 Haoran Xu , Xianyuan Zhan , Jianxiong Li , Honglei Yin

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan