Related papers: High-Dimensional Continuous Control Using Generali…

Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion

Controlling a non-statically bipedal robot is challenging due to the complex dynamics and multi-criterion optimization involved. Recent works have demonstrated the effectiveness of deep reinforcement learning (DRL) for simulation and…

Robotics · Computer Science 2021-12-23 Changxin Huang , Guangrun Wang , Zhibo Zhou , Ronghui Zhang , Liang Lin

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong…

Machine Learning · Computer Science 2020-10-23 Jorge A. Mendez , Boyu Wang , Eric Eaton

The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning

In this theoretical paper we are concerned with the problem of learning a value function by a smooth general function approximator, to solve a deterministic episodic control problem in a large continuous state space. It is shown that…

Machine Learning · Computer Science 2011-01-04 Michael Fairbank , Eduardo Alonso

AWD3: Dynamic Reduction of the Estimation Bias

Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore…

Machine Learning · Computer Science 2021-11-15 Dogan C. Cicek , Enes Duran , Baturay Saglam , Kagan Kaya , Furkan B. Mutlu , Suleyman S. Kozat

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently…

Machine Learning · Computer Science 2019-05-15 Andreas Doerr , Michael Volpp , Marc Toussaint , Sebastian Trimpe , Christian Daniel

Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation.…

Machine Learning · Computer Science 2023-11-07 Tyler Westenbroek , Jacob Levy , David Fridovich-Keil

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model…

Machine Learning · Computer Science 2022-05-17 Yue Wang , Shaofeng Zou

Stabilizing Policy Gradient Methods via Reward Profiling

Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from…

Machine Learning · Computer Science 2026-01-27 Shihab Ahmed , El Houcine Bergou , Aritra Dutta , Yue Wang

Scaling Life-long Off-policy Learning

We pursue a life-long learning approach to artificial intelligence that makes extensive use of reinforcement learning algorithms. We build on our prior work with general value functions (GVFs) and the Horde architecture. GVFs have been…

Artificial Intelligence · Computer Science 2012-06-28 Adam White , Joseph Modayil , Richard S. Sutton

Identifying Policy Gradient Subspaces

Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised…

Machine Learning · Computer Science 2024-03-19 Jan Schneider , Pierre Schumacher , Simon Guist , Le Chen , Daniel Häufle , Bernhard Schölkopf , Dieter Büchler

Learning Bipedal Walking for Humanoid Robots in Challenging Environments with Obstacle Avoidance

Deep reinforcement learning has seen successful implementations on humanoid robots to achieve dynamic walking. However, these implementations have been so far successful in simple environments void of obstacles. In this paper, we aim to…

Robotics · Computer Science 2024-10-14 Marwan Hamze , Mitsuharu Morisawa , Eiichi Yoshida

Reward-Conditioned Policies

Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle,…

Machine Learning · Computer Science 2020-01-01 Aviral Kumar , Xue Bin Peng , Sergey Levine

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an…

Machine Learning · Computer Science 2019-03-26 Maryam Fazel , Rong Ge , Sham M. Kakade , Mehran Mesbahi

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper,…

Robotics · Computer Science 2019-10-10 Arunkumar Byravan , Jost Tobias Springenberg , Abbas Abdolmaleki , Roland Hafner , Michael Neunert , Thomas Lampe , Noah Siegel , Nicolas Heess , Martin Riedmiller

Policy Evaluation Networks

Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this…

Machine Learning · Computer Science 2020-02-28 Jean Harb , Tom Schaul , Doina Precup , Pierre-Luc Bacon

Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning

Natural policy gradient methods are popular reinforcement learning methods that improve the stability of policy gradient methods by utilizing second-order approximations to precondition the gradient with the inverse of the…

Machine Learning · Computer Science 2022-10-12 Brennan Gebotys , Alexander Wong , David A. Clausi

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

Deep learning and reinforcement learning methods have recently been used to solve a variety of problems in continuous control domains. An obvious application of these techniques is dexterous manipulation tasks in robotics which are…

Machine Learning · Computer Science 2017-04-12 Ivaylo Popov , Nicolas Heess , Timothy Lillicrap , Roland Hafner , Gabriel Barth-Maron , Matej Vecerik , Thomas Lampe , Yuval Tassa , Tom Erez , Martin Riedmiller

A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum

Using the policy gradient algorithm, we train a single-hidden-layer neural network to balance a physically accurate simulation of a single inverted pendulum. The trained weights and biases can then be transferred to a physical agent, where…

Machine Learning · Computer Science 2021-02-17 Dylan Bates

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and…

Optimization and Control · Mathematics 2022-10-11 Bin Hu , Kaiqing Zhang , Na Li , Mehran Mesbahi , Maryam Fazel , Tamer Başar