Related papers: Actor-Critic based Improper Reinforcement Learning

Improper Reinforcement Learning with Gradient-based Policy Optimization

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform…

Machine Learning · Computer Science 2021-07-06 Mohammadi Zaki , Avinash Mohan , Aditya Gopalan , Shie Mannor

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Optimal Actor-Critic Policy with Optimized Training Datasets

Actor-critic (AC) algorithms are known for their efficacy and high performance in solving reinforcement learning problems, but they also suffer from low sampling efficiency. An AC based policy optimization process is iterative and needs to…

Machine Learning · Computer Science 2021-12-02 Chayan Banerjee , Zhiyong Chen , Nasimul Noman , Mohsen Zamani

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms…

Machine Learning · Computer Science 2023-12-12 Ruida Zhou , Tao Liu , Min Cheng , Dileep Kalathil , P. R. Kumar , Chao Tian

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan

Reinforcement Learning from Imperfect Demonstrations

Robust real-world learning should benefit from both demonstrations and interactions with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use…

Artificial Intelligence · Computer Science 2019-05-31 Yang Gao , Huazhe Xu , Ji Lin , Fisher Yu , Sergey Levine , Trevor Darrell

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These…

Machine Learning · Computer Science 2025-02-05 Donghe Chen , Yubin Peng , Tengjie Zheng , Han Wang , Chaoran Qu , Lin Cheng

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data…

Machine Learning · Computer Science 2022-06-16 Raghuram Bharadwaj Diddigi , Prateek Jain , Prabuchandran K. J. , Shalabh Bhatnagar

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of…

Machine Learning · Computer Science 2020-05-11 Tengyu Xu , Zhe Wang , Yingbin Liang

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for…

Machine Learning · Computer Science 2018-10-16 Ilya Kostrikov , Kumar Krishna Agrawal , Debidatta Dwibedi , Sergey Levine , Jonathan Tompson

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

AC4MPC: Actor-Critic Reinforcement Learning for Nonlinear Model Predictive Control

\Ac{MPC} and \ac{RL} are two powerful control strategies with, arguably, complementary advantages. In this work, we show how actor-critic \ac{RL} techniques can be leveraged to improve the performance of \ac{MPC}. The \ac{RL} critic is used…

Systems and Control · Electrical Eng. & Systems 2024-06-07 Rudolf Reiter , Andrea Ghezzi , Katrin Baumgärtner , Jasper Hoffmann , Robert D. McAllister , Moritz Diehl

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The…

Machine Learning · Computer Science 2012-06-26 Gergely Neu , Csaba Szepesvari

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been…

Machine Learning · Computer Science 2021-02-15 Tengyu Xu , Zhe Wang , Yingbin Liang

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly…

Machine Learning · Computer Science 2018-10-25 Esther Derman , Daniel J. Mankowitz , Timothy A. Mann , Shie Mannor

Reinforcement Learning for Mixed-Integer Problems Based on MPC

Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic…

Systems and Control · Electrical Eng. & Systems 2020-04-06 Sebastien Gros , Mario Zanon

Towards neural reinforcement learning for large deviations in nonequilibrium systems with memory

We introduce a reinforcement learning method for a class of non-Markov systems; our approach extends the actor-critic framework given by Rose et al. [New J. Phys. 23 013013 (2021)] for obtaining scaled cumulant generating functions…

Statistical Mechanics · Physics 2026-03-09 Venkata D. Pamulaparthy , Rosemary J. Harris

Guide Actor-Critic for Continuous Control

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only…

Machine Learning · Statistics 2018-02-23 Voot Tangkaratt , Abbas Abdolmaleki , Masashi Sugiyama