Related papers: Sample Efficient Reinforcement Learning with REINF…

Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Reinforcement learning provides a mathematical framework for learning-based control, whose success largely depends on the amount of data it can utilize. The efficient utilization of historical trajectories obtained from previous policies is…

Machine Learning · Computer Science 2025-03-06 Yifan Lin , Yuhao Wang , Enlu Zhou

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy…

Machine Learning · Computer Science 2013-01-18 Tingting Zhao , Hirotaka Hachiya , Voot Tangkaratt , Jun Morimoto , Masashi Sugiyama

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model…

Machine Learning · Computer Science 2022-05-17 Yue Wang , Shaofeng Zou

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently…

Machine Learning · Computer Science 2019-06-04 Muhammad A. Masood , Finale Doshi-Velez

The Reinforce Policy Gradient Algorithm Revisited

We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with…

Machine Learning · Computer Science 2023-10-10 Shalabh Bhatnagar

Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis

Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face…

Machine Learning · Computer Science 2025-09-03 Rui Liu , Anish Gupta , Erfaun Noorani , Pratap Tokekar

Ranking Policy Gradient

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization.…

Machine Learning · Computer Science 2019-11-27 Kaixiang Lin , Jiayu Zhou

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and…

Optimization and Control · Mathematics 2023-10-10 Shicong Cen , Yuejie Chi

Elementary Analysis of Policy Gradient Methods

Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning. There have been a flurry of recent…

Optimization and Control · Mathematics 2024-04-12 Jiacai Liu , Wenye Li , Ke Wei

Reusing Trajectories in Policy Gradients Enables Fast Convergence

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring…

Machine Learning · Computer Science 2026-02-03 Alessandro Montenegro , Federico Mansutti , Marco Mussi , Matteo Papini , Alberto Maria Metelli

Identifying Policy Gradient Subspaces

Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised…

Machine Learning · Computer Science 2024-03-19 Jan Schneider , Pierre Schumacher , Simon Guist , Le Chen , Daniel Häufle , Bernhard Schölkopf , Dieter Büchler

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently…

Machine Learning · Computer Science 2019-05-15 Andreas Doerr , Michael Volpp , Marc Toussaint , Sebastian Trimpe , Christian Daniel

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However,…

Machine Learning · Computer Science 2021-05-31 Junyu Zhang , Chengzhuo Ni , Zheng Yu , Csaba Szepesvari , Mengdi Wang

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties,…

Machine Learning · Computer Science 2020-10-16 Alekh Agarwal , Sham M. Kakade , Jason D. Lee , Gaurav Mahajan

Towards Provable Log Density Policy Gradient

Policy gradient methods are a vital ingredient behind the success of modern reinforcement learning. Modern policy gradient methods, although successful, introduce a residual error in gradient estimation. In this work, we argue that this…

Machine Learning · Computer Science 2024-03-05 Pulkit Katdare , Anant Joshi , Katherine Driggs-Campbell

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In…

Machine Learning · Computer Science 2022-04-08 Matthew S. Zhang , Murat A. Erdogdu , Animesh Garg

Stabilizing Policy Gradient Methods via Reward Profiling

Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from…

Machine Learning · Computer Science 2026-01-27 Shihab Ahmed , El Houcine Bergou , Aritra Dutta , Yue Wang

Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation.…

Machine Learning · Computer Science 2023-11-07 Tyler Westenbroek , Jacob Levy , David Fridovich-Keil

Statistically Efficient Off-Policy Policy Gradients

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from…

Machine Learning · Statistics 2020-02-21 Nathan Kallus , Masatoshi Uehara

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues…

Machine Learning · Computer Science 2020-09-15 Daoming Lyu , Qi Qi , Mohammad Ghavamzadeh , Hengshuai Yao , Tianbao Yang , Bo Liu