English
Related papers

Related papers: Improving Actor-Critic Training with Steerable Act…

200 papers

Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in…

Machine Learning · Statistics 2019-10-29 Kamil Ciosek , Quan Vuong , Robert Loftin , Katja Hofmann

In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to…

Machine Learning · Computer Science 2022-04-07 Ted Moskovitz , Jack Parker-Holder , Aldo Pacchiano , Michael Arbel , Michael I. Jordan

Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with…

Machine Learning · Computer Science 2026-02-25 Zahra Shahrooei , Ali Baheri

Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy…

Machine Learning · Computer Science 2026-01-05 Uğurcan Özalp

Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused…

Machine Learning · Computer Science 2024-06-11 Bahareh Tasdighi , Abdullah Akgül , Manuel Haussmann , Kenny Kazimirzak Brink , Melih Kandemir

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These…

Machine Learning · Computer Science 2021-02-09 Yannis Flet-Berliac , Johan Ferret , Olivier Pietquin , Philippe Preux , Matthieu Geist

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

This paper introduces the Active-Importance-Sampling Actor-Critic (AISAC) algorithm, an extension of the Actor-Critic framework for reducing variance in policy gradient estimation. AISAC optimizes the behavior policy to minimize gradient…

Machine Learning · Computer Science 2026-05-11 Majid Molaei , Gabor Paczolay , Matteo Papini , Alberto Maria Metelli , Marcello Restelli

Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly…

Machine Learning · Computer Science 2018-10-25 Esther Derman , Daniel J. Mankowitz , Timothy A. Mann , Shie Mannor

The actor-critic (AC) framework has achieved strong empirical success in off-policy reinforcement learning but suffers from the "moving target" problem, where the evaluated policy changes continually. Functional critics, or…

Machine Learning · Computer Science 2026-02-10 Qinxun Bai , Yuxuan Han , Wei Xu , Zhengyuan Zhou

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning…

Robotics · Computer Science 2026-02-25 Zhiwei Shang , Xinyi Yuan , Wenjun Huang , Yunduan Cui , Di Chen , Meixin Zhu

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to…

Machine Learning · Computer Science 2021-08-20 Andrea Zanette , Martin J. Wainwright , Emma Brunskill

We propose WSAC (Weighted Safe Actor-Critic), a novel algorithm for Safe Offline Reinforcement Learning (RL) under functional approximation, which can robustly optimize policies to improve upon an arbitrary reference policy with limited…

Machine Learning · Computer Science 2024-11-01 Honghao Wei , Xiyue Peng , Arnob Ghosh , Xin Liu

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player…

Machine Learning · Computer Science 2022-07-07 Ching-An Cheng , Tengyang Xie , Nan Jiang , Alekh Agarwal

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a…

Machine Learning · Computer Science 2021-10-06 Lingwei Zhu , Toshinori Kitamura , Takamitsu Matsubara

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan

The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures…

‹ Prev 1 2 3 10 Next ›