Related papers: Identifying Policy Gradient Subspaces

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong…

Machine Learning · Computer Science 2020-10-23 Jorge A. Mendez , Boyu Wang , Eric Eaton

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently…

Machine Learning · Computer Science 2019-05-15 Andreas Doerr , Michael Volpp , Marc Toussaint , Sebastian Trimpe , Christian Daniel

Hindsight policy gradients

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable…

Machine Learning · Computer Science 2019-02-21 Paulo Rauber , Avinash Ummadisingu , Filipe Mutz , Juergen Schmidhuber

Policy Gradient in Partially Observable Environments: Approximation and Convergence

Policy gradient is a generic and flexible reinforcement learning approach that generally enjoys simplicity in analysis, implementation, and deployment. In the last few decades, this approach has been extensively advanced for fully…

Machine Learning · Computer Science 2020-05-26 Kamyar Azizzadenesheli , Yisong Yue , Animashree Anandkumar

Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation.…

Machine Learning · Computer Science 2023-11-07 Tyler Westenbroek , Jacob Levy , David Fridovich-Keil

The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations

In recent years, various powerful policy gradient algorithms have been proposed in deep reinforcement learning. While all these algorithms build on the Policy Gradient Theorem, the specific design choices differ significantly across…

Machine Learning · Computer Science 2024-03-04 Matthias Lehmann

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties,…

Machine Learning · Computer Science 2020-10-16 Alekh Agarwal , Sham M. Kakade , Jason D. Lee , Gaurav Mahajan

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently…

Machine Learning · Computer Science 2019-06-04 Muhammad A. Masood , Finale Doshi-Velez

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and…

Optimization and Control · Mathematics 2022-10-11 Bin Hu , Kaiqing Zhang , Na Li , Mehran Mesbahi , Maryam Fazel , Tamer Başar

Partial Policy Gradients for RL in LLMs

Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to optimize for a subset of future rewards:…

Machine Learning · Computer Science 2026-03-09 Puneet Mathur , Branislav Kveton , Subhojyoti Mukherjee , Viet Dac Lai

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

Policy gradient methods in reinforcement learning have become increasingly prevalent for state-of-the-art performance in continuous control tasks. Novel methods typically benchmark against a few key algorithms such as deep deterministic…

Machine Learning · Computer Science 2017-08-15 Riashat Islam , Peter Henderson , Maziar Gomrokchi , Doina Precup

Building a Subspace of Policies for Scalable Continual Learning

The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size…

Machine Learning · Computer Science 2023-03-03 Jean-Baptiste Gaya , Thang Doan , Lucas Caccia , Laure Soulier , Ludovic Denoyer , Roberta Raileanu

Learning Optimal Deterministic Policies with Stochastic Policy Gradients

Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters.…

Machine Learning · Computer Science 2024-05-31 Alessandro Montenegro , Marco Mussi , Alberto Maria Metelli , Matteo Papini

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the…

Machine Learning · Computer Science 2023-04-20 Weiqin Chen , Dharmashankar Subramanian , Santiago Paternain

Gradient dynamics in reinforcement learning

Despite the success achieved by the analysis of supervised learning algorithms in the framework of statistical mechanics, reinforcement learning has remained largely untouched. Here we move towards closing the gap by analyzing the dynamics…

Statistical Mechanics · Physics 2022-09-02 Riccardo Fabbricatore , Vladimir V. Palyulin

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy…

Machine Learning · Computer Science 2013-01-18 Tingting Zhao , Hirotaka Hachiya , Voot Tangkaratt , Jun Morimoto , Masashi Sugiyama

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model…

Machine Learning · Computer Science 2022-05-17 Yue Wang , Shaofeng Zou

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and…

Optimization and Control · Mathematics 2023-10-10 Shicong Cen , Yuejie Chi

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification…

Machine Learning · Computer Science 2023-10-24 Adrien Bolland , Gilles Louppe , Damien Ernst

Stabilizing Policy Gradient Methods via Reward Profiling

Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from…

Machine Learning · Computer Science 2026-01-27 Shihab Ahmed , El Houcine Bergou , Aritra Dutta , Yue Wang