Related papers: Model-Augmented Actor-Critic: Backpropagating thro…

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the…

Artificial Intelligence · Computer Science 2020-10-23 Pierluca D'Oro , Wojciech Jaśkowski

When to Trust Your Model: Model-Based Policy Optimization

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy…

Machine Learning · Computer Science 2021-11-30 Michael Janner , Justin Fu , Marvin Zhang , Sergey Levine

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

Worst Cases Policy Gradients

Recent advances in deep reinforcement learning have demonstrated the capability of learning complex control policies from many types of environments. When learning policies for safety-critical applications, it is essential to be sensitive…

Machine Learning · Computer Science 2019-11-12 Yichuan Charlie Tang , Jian Zhang , Ruslan Salakhutdinov

Uncertainty-aware Model-based Policy Optimization

Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance…

Machine Learning · Computer Science 2019-06-27 Tung-Long Vuong , Kenneth Tran

On-Policy Model Errors in Reinforcement Learning

Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model…

Machine Learning · Computer Science 2022-03-04 Lukas P. Fröhlich , Maksym Lefarov , Melanie N. Zeilinger , Felix Berkenkamp

Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives

This work investigates the formal policy synthesis of continuous-state stochastic dynamic systems given high-level specifications in linear temporal logic. To learn an optimal policy that maximizes the satisfaction probability, we take a…

Artificial Intelligence · Computer Science 2023-04-21 Lening Li , Zhentian Qian

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly…

Machine Learning · Computer Science 2022-09-05 Yali Du , Chengdong Ma , Yuchen Liu , Runji Lin , Hao Dong , Jun Wang , Yaodong Yang

An intelligent algorithmic trading based on a risk-return reinforcement learning algorithm

This scientific paper propose a novel portfolio optimization model using an improved deep reinforcement learning algorithm. The objective function of the optimization model is the weighted sum of the expectation and value at risk(VaR) of…

Machine Learning · Computer Science 2022-08-30 Boyi Jin

Learning Value Functions in Deep Policy Gradients using Residual Variance

Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing…

Machine Learning · Computer Science 2021-03-17 Yannis Flet-Berliac , Reda Ouhamma , Odalric-Ambrym Maillard , Philippe Preux

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an…

Machine Learning · Computer Science 2019-06-13 Denis Steckelmacher , Hélène Plisnier , Diederik M. Roijers , Ann Nowé

Actor Critic with Differentially Private Critic

Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by leveraging information (e.g., via pre-training) on other related tasks. In this work, we propose a…

Machine Learning · Computer Science 2019-10-15 Jonathan Lebensold , William Hamilton , Borja Balle , Doina Precup

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Tactical Optimism and Pessimism for Deep Reinforcement Learning

In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to…

Machine Learning · Computer Science 2022-04-07 Ted Moskovitz , Jack Parker-Holder , Aldo Pacchiano , Michael Arbel , Michael I. Jordan

Variance Penalized On-Policy and Off-Policy Actor-Critic

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on…

Machine Learning · Computer Science 2015-03-20 Thomas Degris , Martha White , Richard S. Sutton

Actor-Free Continuous Control via Structurally Maximizable Q-Functions

Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating…

Machine Learning · Computer Science 2025-10-23 Yigit Korkmaz , Urvi Bhuwania , Ayush Jain , Erdem Bıyık

Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning

Substantial advancements to model-based reinforcement learning algorithms have been impeded by the model-bias induced by the collected data, which generally hurts performance. Meanwhile, their inherent sample efficiency warrants utility for…

Robotics · Computer Science 2021-11-01 Andrew S. Morgan , Daljeet Nandha , Georgia Chalvatzaki , Carlo D'Eramo , Aaron M. Dollar , Jan Peters