English
Related papers

Related papers: Phasic Policy Gradient

200 papers

Despite extreme sample inefficiency, on-policy reinforcement learning, aka policy gradients, has become a fundamental tool in decision-making problems. With the recent advances in GPU-driven simulation, the ability to collect large amounts…

Machine Learning · Computer Science 2024-07-30 Jayesh Singla , Ananye Agarwal , Deepak Pathak

We propose Fractional Policy Gradients (FPG), a reinforcement learning framework incorporating fractional calculus for long-term temporal modeling in policy optimization. Standard policy gradient approaches face limitations from Markovian…

Machine Learning · Computer Science 2025-07-02 Urvi Pawar , Kunal Telangi

Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token…

Machine Learning · Computer Science 2025-06-19 Zheng Li , Jerry Cheng , Huanying Helen Gu

Our work focuses on training RL agents on multiple visually diverse environments to improve observational generalization performance. In prior methods, policy and value networks are separately optimized using a disjoint network architecture…

Machine Learning · Computer Science 2023-01-10 Seungyong Moon , JunYeong Lee , Hyun Oh Song

Reinforcement learning (RL) shows great potential in sequential decision-making. At present, mainstream RL algorithms are data-driven, which usually yield better asymptotic performance but much slower convergence compared with model-driven…

Machine Learning · Computer Science 2024-02-27 Yang Guan , Jingliang Duan , Shengbo Eben Li , Jie Li , Jianyu Chen , Bo Cheng

Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective…

Machine Learning · Computer Science 2018-05-21 Patrick Emami , Sanjay Ranka

Reinforcement Learning (RL) can directly enhance the reasoning capabilities of large language models without extensive reliance on Supervised Fine-Tuning (SFT). In this work, we revisit the traditional Policy Gradient (PG) mechanism and…

Machine Learning · Computer Science 2026-02-04 Xiangxiang Chu , Hailang Huang , Xiao Zhang , Fei Wei , Yong Wang

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve…

Machine Learning · Computer Science 2018-05-01 Rein Houthooft , Richard Y. Chen , Phillip Isola , Bradly C. Stadie , Filip Wolski , Jonathan Ho , Pieter Abbeel

We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate…

Machine Learning · Computer Science 2021-12-06 Hung Le , Majid Abdolshah , Thommen K. George , Kien Do , Dung Nguyen , Svetha Venkatesh

The Sampled Policy Gradient (SPG) algorithm is a new offline actor-critic variant that samples in the action space to approximate the policy gradient. It does so by using the critic to evaluate the sampled actions. SPG offers theoretical…

Machine Learning · Computer Science 2019-10-10 Nil Stolt Ansó

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring…

Machine Learning · Computer Science 2026-02-03 Alessandro Montenegro , Federico Mansutti , Marco Mussi , Matteo Papini , Alberto Maria Metelli

We introduce Policy Gradient Guidance (PGG), a simple extension of classifier-free guidance from diffusion models to classical policy gradient methods. PGG augments the policy gradient with an unconditional branch and interpolates…

Machine Learning · Computer Science 2025-10-03 Jianing Qi , Hao Tang , Zhigang Zhu

We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given…

Machine Learning · Computer Science 2022-07-26 Yanwei Jia , Xun Yu Zhou

Projected policy gradient (PPG) is a basic policy optimization method in reinforcement learning. Given access to exact policy evaluations, previous studies have established the sublinear convergence of PPG for sufficiently small step sizes…

Optimization and Control · Mathematics 2024-09-19 Jiacai Liu , Wenye Li , Dachao Lin , Ke Wei , Zhihua Zhang

Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between…

Machine Learning · Computer Science 2021-05-07 Bozhidar Vasilev , Tarun Gupta , Bei Peng , Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when…

Machine Learning · Statistics 2020-05-05 Kamil Ciosek , Shimon Whiteson

Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales…

Machine Learning · Computer Science 2021-11-24 Thomas Spooner , Nelson Vadori , Sumitra Ganesh

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization.…

Machine Learning · Computer Science 2019-11-27 Kaixiang Lin , Jiayu Zhou

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications.…

Machine Learning · Computer Science 2018-05-31 Matthew Fellows , Kamil Ciosek , Shimon Whiteson

Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting. However, vanilla online variants are on-policy only and not able to take advantage of off-policy data. In this paper we describe a new…

Machine Learning · Computer Science 2017-04-10 Brendan O'Donoghue , Remi Munos , Koray Kavukcuoglu , Volodymyr Mnih
‹ Prev 1 2 3 10 Next ›