Related papers: Phasic Policy Gradient

SAPG: Split and Aggregate Policy Gradients

Despite extreme sample inefficiency, on-policy reinforcement learning, aka policy gradients, has become a fundamental tool in decision-making problems. With the recent advances in GPU-driven simulation, the ability to collect large amounts…

Machine Learning · Computer Science 2024-07-30 Jayesh Singla , Ananye Agarwal , Deepak Pathak

Fractional Policy Gradients: Reinforcement Learning with Long-Term Memory

We propose Fractional Policy Gradients (FPG), a reinforcement learning framework incorporating fractional calculus for long-term temporal modeling in policy optimization. Standard policy gradient approaches face limitations from Markovian…

Machine Learning · Computer Science 2025-07-02 Urvi Pawar , Kunal Telangi

Sequential Policy Gradient for Adaptive Hyperparameter Optimization

Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token…

Machine Learning · Computer Science 2025-06-19 Zheng Li , Jerry Cheng , Huanying Helen Gu

Rethinking Value Function Learning for Generalization in Reinforcement Learning

Our work focuses on training RL agents on multiple visually diverse environments to improve observational generalization performance. In prior methods, policy and value networks are separately optimized using a disjoint network architecture…

Machine Learning · Computer Science 2023-01-10 Seungyong Moon , JunYeong Lee , Hyun Oh Song

Mixed Policy Gradient: off-policy reinforcement learning driven jointly by data and model

Reinforcement learning (RL) shows great potential in sequential decision-making. At present, mainstream RL algorithms are data-driven, which usually yield better asymptotic performance but much slower convergence compared with model-driven…

Machine Learning · Computer Science 2024-02-27 Yang Guan , Jingliang Duan , Shengbo Eben Li , Jie Li , Jianyu Chen , Bo Cheng

Learning Permutations with Sinkhorn Policy Gradient

Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective…

Machine Learning · Computer Science 2018-05-21 Patrick Emami , Sanjay Ranka

GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

Reinforcement Learning (RL) can directly enhance the reasoning capabilities of large language models without extensive reliance on Supervised Fine-Tuning (SFT). In this work, we revisit the traditional Policy Gradient (PG) mechanism and…

Machine Learning · Computer Science 2026-02-04 Xiangxiang Chu , Hailang Huang , Xiao Zhang , Fei Wei , Yong Wang

Evolved Policy Gradients

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve…

Machine Learning · Computer Science 2018-05-01 Rein Houthooft , Richard Y. Chen , Phillip Isola , Bradly C. Stadie , Filip Wolski , Jonathan Ho , Pieter Abbeel

Episodic Policy Gradient Training

We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate…

Machine Learning · Computer Science 2021-12-06 Hung Le , Majid Abdolshah , Thommen K. George , Kien Do , Dung Nguyen , Svetha Venkatesh

Investigation on the generalization of the Sampled Policy Gradient algorithm

The Sampled Policy Gradient (SPG) algorithm is a new offline actor-critic variant that samples in the action space to approximate the policy gradient. It does so by using the critic to evaluate the sampled actions. SPG offers theoretical…

Machine Learning · Computer Science 2019-10-10 Nil Stolt Ansó

Reusing Trajectories in Policy Gradients Enables Fast Convergence

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring…

Machine Learning · Computer Science 2026-02-03 Alessandro Montenegro , Federico Mansutti , Marco Mussi , Matteo Papini , Alberto Maria Metelli

Policy Gradient Guidance Enables Test Time Control

We introduce Policy Gradient Guidance (PGG), a simple extension of classifier-free guidance from diffusion models to classical policy gradient methods. PGG augments the policy gradient with an unconditional branch and interpolates…

Machine Learning · Computer Science 2025-10-03 Jianing Qi , Hao Tang , Zhigang Zhu

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given…

Machine Learning · Computer Science 2022-07-26 Yanwei Jia , Xun Yu Zhou

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Projected policy gradient (PPG) is a basic policy optimization method in reinforcement learning. Given access to exact policy evaluations, previous studies have established the sublinear convergence of PPG for sufficiently small step sizes…

Optimization and Control · Mathematics 2024-09-19 Jiacai Liu , Wenye Li , Dachao Lin , Ke Wei , Zhihua Zhang

Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between…

Machine Learning · Computer Science 2021-05-07 Bozhidar Vasilev , Tarun Gupta , Bei Peng , Shimon Whiteson

Expected Policy Gradients for Reinforcement Learning

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when…

Machine Learning · Statistics 2020-05-05 Kamil Ciosek , Shimon Whiteson

Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs

Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales…

Machine Learning · Computer Science 2021-11-24 Thomas Spooner , Nelson Vadori , Sumitra Ganesh

Ranking Policy Gradient

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization.…

Machine Learning · Computer Science 2019-11-27 Kaixiang Lin , Jiayu Zhou

Fourier Policy Gradients

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications.…

Machine Learning · Computer Science 2018-05-31 Matthew Fellows , Kamil Ciosek , Shimon Whiteson

Combining policy gradient and Q-learning

Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting. However, vanilla online variants are on-policy only and not able to take advantage of off-policy data. In this paper we describe a new…

Machine Learning · Computer Science 2017-04-10 Brendan O'Donoghue , Remi Munos , Koray Kavukcuoglu , Volodymyr Mnih