Related papers: A Self-Tuning Actor-Critic Algorithm

Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Reinforcement learning (RL) has had many successes in both "deep" and "shallow" settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is…

Machine Learning · Computer Science 2019-05-27 Kenny Young , Baoxiang Wang , Matthew E. Taylor

Mean Actor Critic

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient…

Machine Learning · Statistics 2018-05-24 Cameron Allen , Kavosh Asadi , Melrose Roderick , Abdel-rahman Mohamed , George Konidaris , Michael Littman

Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These…

Machine Learning · Computer Science 2025-02-05 Donghe Chen , Yubin Peng , Tengjie Zheng , Han Wang , Chaoran Qu , Lin Cheng

Reinforcement Learning for Learning Rate Control

Stochastic gradient descent (SGD), which updates the model parameters by adding a local gradient times a learning rate at each step, is widely used in model training of machine learning algorithms such as neural networks. It is observed…

Machine Learning · Computer Science 2017-06-01 Chang Xu , Tao Qin , Gang Wang , Tie-Yan Liu

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the…

Machine Learning · Computer Science 2020-11-03 Wei Zhou , Yiying Li , Yongxin Yang , Huaimin Wang , Timothy M. Hospedales

Guide Actor-Critic for Continuous Control

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only…

Machine Learning · Statistics 2018-02-23 Voot Tangkaratt , Abbas Abdolmaleki , Masashi Sugiyama

Actor-Critic Reinforcement Learning with Phased Actor

Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness…

Machine Learning · Computer Science 2024-04-19 Ruofan Wu , Junmin Zhong , Jennie Si

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

Deep reinforcement learning (RL) algorithms can use high-capacity deep networks to learn directly from image observations. However, these high-dimensional observation spaces present a number of challenges in practice, since the policy must…

Machine Learning · Computer Science 2020-10-27 Alex X. Lee , Anusha Nagabandi , Pieter Abbeel , Sergey Levine

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Machine Learning · Computer Science 2019-09-16 Tuomas Haarnoja , Aurick Zhou , Kristian Hartikainen , George Tucker , Sehoon Ha , Jie Tan , Vikash Kumar , Henry Zhu , Abhishek Gupta , Pieter Abbeel , Sergey Levine

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-policy actor-critic deep RL algorithm. BAC mathematically formulates the behavior of the policy through autoencoders by providing an accurate estimation of how frequently…

Machine Learning · Computer Science 2021-04-12 Ammar Fayad , Majd Ibrahim

GRAC: Self-Guided and Self-Regularized Actor-Critic

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network…

Machine Learning · Computer Science 2020-11-12 Lin Shao , Yifan You , Mengyuan Yan , Qingyun Sun , Jeannette Bohg

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an…

Machine Learning · Computer Science 2020-07-17 Zhongwen Xu , Hado van Hasselt , Matteo Hessel , Junhyuk Oh , Satinder Singh , David Silver

Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty

Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy…

Machine Learning · Computer Science 2026-01-05 Uğurcan Özalp

Metaoptimization on a Distributed System for Deep Reinforcement Learning

Training intelligent agents through reinforcement learning is a notoriously unstable procedure. Massive parallelization on GPUs and distributed systems has been exploited to generate a large amount of training experiences and consequently…

Machine Learning · Computer Science 2019-02-08 Greg Heinrich , Iuri Frosio

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

A Method for Fast Autonomy Transfer in Reinforcement Learning

This paper introduces a novel reinforcement learning (RL) strategy designed to facilitate rapid autonomy transfer by utilizing pre-trained critic value functions from multiple environments. Unlike traditional methods that require extensive…

Machine Learning · Computer Science 2024-07-31 Dinuka Sahabandu , Bhaskar Ramasubramanian , Michail Alexiou , J. Sukarno Mertoguno , Linda Bushnell , Radha Poovendran

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Instance Selection for Dynamic Algorithm Configuration with Reinforcement Learning: Improving Generalization

Dynamic Algorithm Configuration (DAC) addresses the challenge of dynamically setting hyperparameters of an algorithm for a diverse set of instances rather than focusing solely on individual tasks. Agents trained with Deep Reinforcement…

Machine Learning · Computer Science 2024-07-19 Carolin Benjamins , Gjorgjina Cenikj , Ana Nikolikj , Aditya Mohan , Tome Eftimov , Marius Lindauer

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Meta SAC-Lag: Towards Deployable Safe Reinforcement Learning via MetaGradient-based Hyperparameter Tuning

Safe Reinforcement Learning (Safe RL) is one of the prevalently studied subcategories of trial-and-error-based methods with the intention to be deployed on real-world systems. In safe RL, the goal is to maximize reward performance while…

Machine Learning · Computer Science 2024-08-16 Homayoun Honari , Amir Mehdi Soufi Enayati , Mehran Ghafarian Tamizi , Homayoun Najjaran