Machine Learning · Computer Science
Provably Efficient Exploration in Policy Optimization
Qi Cai, Zhuoran Yang, Chi Jin, Zhaoran Wang
2024-04-02
Machine Learning · Computer Science
On-Policy RL with Optimal Reward Baseline
Yaru Hao, Li Dong, Xun Wu, Shaohan Huang +2
2025-06-05
Machine Learning · Computer Science
Truly Proximal Policy Optimization
Yuhui Wang, Hao He, Chao Wen, Xiaoyang Tan
2020-01-15
Machine Learning · Computer Science
Transductive Off-policy Proximal Policy Optimization
Yaozhong Gan, Renye Yan, Xiaoyang Tan, Zhe Wu +1
2024-06-07
Machine Learning · Computer Science
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford +1
2017-08-29
Machine Learning · Computer Science
Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards
Ahmad Ahmad, Mehdi Kermanshah, Kevin Leahy, Zachary Serlin +5
2024-12-06
Machine Learning · Computer Science
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Junbo Li, Peng Zhou, Rui Meng, Meet P. Vadera +2
2026-01-27
Machine Learning · Computer Science
Constrained Policy Optimization
Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel
2017-05-31
Machine Learning · Computer Science
Proximal Policy Optimization with Mixed Distributed Training
Zhenyu Zhang, Xiangfeng Luo, Tong Liu, Shaorong Xie +4
2019-10-01
Machine Learning · Computer Science
Proximal Policy Optimization via Enhanced Exploration Efficiency
Junwei Zhang, Zhenghao Zhang, Shuai Han, Shuai Lü
2020-11-12
Machine Learning · Computer Science
Absolute Policy Optimization
Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen +2
2024-05-31
Machine Learning · Computer Science
Reflective Policy Optimization
Yaozhong Gan, Renye Yan, Zhe Wu, Junliang Xing
2024-06-07
Machine Learning · Computer Science
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
Youssef Mroueh, Nicolas Dupuis, Brian Belgodere, Apoorva Nitsure +5
2025-06-02
Machine Learning · Computer Science
Penalized Proximal Policy Optimization for Safe Reinforcement Learning
Linrui Zhang, Li Shen, Long Yang, Shixiang Chen +3
2022-06-20
Machine Learning · Computer Science
Beyond Reward: Offline Preference-guided Policy Optimization
Yachen Kang, Diyuan Shi, Jinxin Liu, Li He +1
2023-06-12
Machine Learning · Computer Science
Trust Region-Guided Proximal Policy Optimization
Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan
2019-11-11
Machine Learning · Computer Science
Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information
Jana Mayer, Johannes Westermann, Juan Pedro Gutiérrez H. Muriedas, Uwe Mettin +1
2021-07-21
Machine Learning · Computer Science
Beyond the Boundaries of Proximal Policy Optimization
Charlie B. Tan, Edan Toledo, Benjamin Ellis, Jakob N. Foerster +1
2024-11-04