Related papers: Improving Actor-Critic Training with Steerable Act…

Better Exploration with Optimistic Actor-Critic

Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in…

Machine Learning · Statistics 2019-10-29 Kamil Ciosek , Quan Vuong , Robert Loftin , Katja Hofmann

Tactical Optimism and Pessimism for Deep Reinforcement Learning

In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to…

Machine Learning · Computer Science 2022-04-07 Ted Moskovitz , Jack Parker-Holder , Aldo Pacchiano , Michael Arbel , Michael I. Jordan

Wasserstein Barycenter Soft Actor-Critic

Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with…

Machine Learning · Computer Science 2026-02-25 Zahra Shahrooei , Ali Baheri

Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty

Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy…

Machine Learning · Computer Science 2026-01-05 Uğurcan Özalp

PAC-Bayesian Soft Actor-Critic Learning

Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused…

Machine Learning · Computer Science 2024-06-11 Bahareh Tasdighi , Abdullah Akgül , Manuel Haussmann , Kenny Kazimirzak Brink , Melih Kandemir

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Adversarially Guided Actor-Critic

Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These…

Machine Learning · Computer Science 2021-02-09 Yannis Flet-Berliac , Johan Ferret , Olivier Pietquin , Philippe Preux , Matthieu Geist

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Machine Learning · Computer Science 2019-09-16 Tuomas Haarnoja , Aurick Zhou , Kristian Hartikainen , George Tucker , Sehoon Ha , Jie Tan , Vikash Kumar , Henry Zhu , Abhishek Gupta , Pieter Abbeel , Sergey Levine

Actor-Critic with Active Importance Sampling

This paper introduces the Active-Importance-Sampling Actor-Critic (AISAC) algorithm, an extension of the Actor-Critic framework for reducing variance in policy gradient estimation. AISAC optimizes the behavior policy to minimize gradient…

Machine Learning · Computer Science 2026-05-11 Majid Molaei , Gabor Paczolay , Matteo Papini , Alberto Maria Metelli , Marcello Restelli

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly…

Machine Learning · Computer Science 2018-10-25 Esther Derman , Daniel J. Mankowitz , Timothy A. Mann , Shie Mannor

Functional Critics Are Essential for Actor-Critic: From Off-Policy Stability to Efficient Exploration

The actor-critic (AC) framework has achieved strong empirical success in off-policy reinforcement learning but suffers from the "moving target" problem, where the evaluated policy changes continually. Functional critics, or…

Machine Learning · Computer Science 2026-02-10 Qinxun Bai , Yuxuan Han , Wei Xu , Zhengyuan Zhou

Effective Reinforcement Learning Control using Conservative Soft Actor-Critic

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning…

Robotics · Computer Science 2026-02-25 Zhiwei Shang , Xinyi Yuan , Wenjun Huang , Yunduan Cui , Di Chen , Meixin Zhu

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to…

Machine Learning · Computer Science 2021-08-20 Andrea Zanette , Martin J. Wainwright , Emma Brunskill

Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning

We propose WSAC (Weighted Safe Actor-Critic), a novel algorithm for Safe Offline Reinforcement Learning (RL) under functional approximation, which can robustly optimize policies to improve upon an arbitrary reference policy with limited…

Machine Learning · Computer Science 2024-11-01 Honghao Wei , Xiyue Peng , Arnob Ghosh , Xin Liu

Adversarially Trained Actor Critic for Offline Reinforcement Learning

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player…

Machine Learning · Computer Science 2022-07-07 Ching-An Cheng , Tengyang Xie , Nan Jiang , Alekh Agarwal

Soft Actor-Critic Algorithm with Truly-satisfied Inequality Constraint

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi

Cautious Actor-Critic

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a…

Machine Learning · Computer Science 2021-10-06 Lingwei Zhu , Toshinori Kitamura , Takamitsu Matsubara

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan

SARC: Soft Actor Retrospective Critic

The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures…

Machine Learning · Computer Science 2023-06-30 Sukriti Verma , Ayush Chopra , Jayakumar Subramanian , Mausoom Sarkar , Nikaash Puri , Piyush Gupta , Balaji Krishnamurthy