English
Related papers

Related papers: PAC-Bayesian Soft Actor-Critic Learning

200 papers

Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in massively parallel simulation environments like IsaacLab. However, its on-policy nature makes it…

Robotics · Computer Science 2026-05-26 Gianluca Sabatini , Chenhao Li , Marco Hutter

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly…

Machine Learning · Computer Science 2018-10-25 Esther Derman , Daniel J. Mankowitz , Timothy A. Mann , Shie Mannor

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures…

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning…

Robotics · Computer Science 2026-02-25 Zhiwei Shang , Xinyi Yuan , Wenjun Huang , Yunduan Cui , Di Chen , Meixin Zhu

The Soft Actor-Critic (SAC) algorithm, a state-of-the-art method in maximum entropy reinforcement learning, traditionally relies on minimizing reverse Kullback-Leibler (KL) divergence for policy updates. However, this approach leads to an…

Machine Learning · Computer Science 2025-06-03 Yixian Zhang , Huaze Tang , Changxu Wei , Wenbo Ding

Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL…

Machine Learning · Computer Science 2022-07-07 Yannis Flet-Berliac , Debabrota Basu

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi

Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address…

Machine Learning · Computer Science 2022-05-03 Tanuja Joshi , Hariprasad Kodamana , Harikumar Kandath , Niket Kaisare

Recent advances in deep reinforcement learning have achieved impressive results in a wide range of complex tasks, but poor sample efficiency remains a major obstacle to real-world deployment. Soft actor-critic (SAC) mitigates this problem…

Machine Learning · Computer Science 2024-09-10 Luca Della Libera

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinforcement learning (RL) algorithms that is within the maximum entropy based RL framework. SAC is demonstrated to perform very well in a list of continous control tasks…

Machine Learning · Computer Science 2021-12-22 Zhenyang Shi , Surya P. N. Singh

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start…

Machine Learning · Computer Science 2023-06-21 Hang Wang , Sen Lin , Junshan Zhang

Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low…

Machine Learning · Computer Science 2021-03-17 Baiyu Peng , Yao Mu , Yang Guan , Shengbo Eben Li , Yuming Yin , Jianyu Chen

Off-policy actor-critic algorithms have shown strong potential in deep reinforcement learning for continuous control tasks. Their success primarily comes from leveraging pessimistic state-action value function updates, which reduce function…

Machine Learning · Computer Science 2025-08-21 Bahareh Tasdighi , Nicklas Werge , Yi-Shan Wu , Melih Kandemir

We derive a novel PAC-Bayesian generalization bound for reinforcement learning that explicitly accounts for Markov dependencies in the data, through the chain's mixing time. This contributes to overcoming challenges in obtaining…

Machine Learning · Computer Science 2026-02-10 Abdelkrim Zitouni , Mehdi Hennequin , Juba Agoun , Ryan Horache , Nadia Kabachi , Omar Rivasplata

We propose WSAC (Weighted Safe Actor-Critic), a novel algorithm for Safe Offline Reinforcement Learning (RL) under functional approximation, which can robustly optimize policies to improve upon an arbitrary reference policy with limited…

Machine Learning · Computer Science 2024-11-01 Honghao Wei , Xiyue Peng , Arnob Ghosh , Xin Liu

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Most prior approaches to offline reinforcement learning (RL) utilize \textit{behavior regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty measuring divergence between the policy and the offline…

Machine Learning · Computer Science 2021-10-15 Haoran Xu , Xianyuan Zhan , Jianxiong Li , Honglei Yin

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan
‹ Prev 1 2 3 10 Next ›