English
Related papers

Related papers: Guided Policy Optimization under Partial Observabi…

200 papers

Due to recent breakthroughs, reinforcement learning (RL) has demonstrated impressive performance in challenging sequential decision-making problems. However, an open question is how to make RL cope with partial observability which is…

Machine Learning · Computer Science 2021-04-23 Stephan Weigand , Pascal Klink , Jan Peters , Joni Pajarinen

When learning common skills like driving, beginners usually have domain experts standing by to ensure the safety of the learning process. We formulate such learning scheme under the Expert-in-the-loop Reinforcement Learning where a guardian…

Artificial Intelligence · Computer Science 2021-11-02 Zhenghao Peng , Quanyi Li , Chunxiao Liu , Bolei Zhou

Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart…

Robotics · Computer Science 2020-02-25 Siddhant Gangapurwala , Alexander Mitchell , Ioannis Havoutis

We revisit Group Relative Policy Optimization (GRPO) in both on-policy and off-policy optimization regimes. Our motivation comes from recent work on off-policy Proximal Policy Optimization (PPO), which improves training stability, sampling…

Training reinforcement learning (RL) policies for legged robots remains challenging due to high-dimensional continuous actions, hardware constraints, and limited exploration. Existing methods for locomotion and whole-body control work well…

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a powerful paradigm for facilitating the self-improvement of large language models (LLMs), particularly in the domain of complex reasoning tasks. However,…

Machine Learning · Computer Science 2025-07-17 Ziru Liu , Cheng Gong , Xinyu Fu , Yaofang Liu , Ran Chen , Shoubo Hu , Suiyun Zhang , Rui Liu , Qingfu Zhang , Dandan Tu

Group Relative Policy Optimization (GRPO) has shown promise in discrete action spaces by eliminating value function dependencies through group-based advantage estimation. However, its application to continuous control remains unexplored,…

Robotics · Computer Science 2025-07-29 Rajat Khanda , Mohammad Baqar , Sambuddha Chakrabarti , Satyasaran Changdar

As language models become increasingly capable, users expect them to provide not only accurate responses but also behaviors aligned with diverse human preferences across a variety of scenarios. To achieve this, Reinforcement learning (RL)…

Reinforcement learning (RL) has proven effective in strengthening the reasoning capabilities of large language models (LLMs). A widely adopted method, Group Relative Policy Optimization (GRPO), has shown strong empirical results in training…

Machine Learning · Computer Science 2026-03-11 Peter Chen , Xiaopeng Li , Ziniu Li , Xi Chen , Tianyi Lin

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

In recent years, reinforcement learning (RL) has gained increasing attention in control engineering. Especially, policy gradient methods are widely used. In this work, we improve the tracking performance of proximal policy optimization…

Machine Learning · Computer Science 2021-07-21 Jana Mayer , Johannes Westermann , Juan Pedro Gutiérrez H. Muriedas , Uwe Mettin , Alexander Lampe

Reinforcement Learning with Verifiable Rewards (RLVR) has markedly enhanced the reasoning abilities of large language models (LLMs). Its success, however, largely depends on strong base models with rich world knowledge, yielding only modest…

Artificial Intelligence · Computer Science 2025-08-19 Yongxin Guo , Wenbo Deng , Zhenglin Cheng , Xiaoying Tang

Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations,…

Machine Learning · Computer Science 2022-10-14 Chris Lu , Jakub Grudzien Kuba , Alistair Letcher , Luke Metz , Christian Schroeder de Witt , Jakob Foerster

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel

Standard reinforcement learning from human feedback (RLHF) trains a reward model on pairwise preference data and then uses it for policy optimization. However, while reward models are optimized to capture relative preferences, existing…

Machine Learning · Computer Science 2026-02-05 Kyuseong Choi , Dwaipayan Saha , Woojeong Kim , Anish Agarwal , Raaz Dwivedi

Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any…

Robotics · Computer Science 2019-10-23 Guillaume Bellegarda , Katie Byl

Hybrid Group Relative Policy Optimization (Hybrid GRPO) is a reinforcement learning framework that extends Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO) by incorporating empirical multi-sample action…

Machine Learning · Computer Science 2025-02-05 Soham Sane

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

The Group Relative Policy Optimization (GRPO), a reinforcement learning method used to fine-tune large language models (LLMs), has proved its effectiveness in practical applications such as DeepSeek-R1. It raises a question whether GRPO can…

Machine Learning · Computer Science 2025-11-20 Yanchen Xu , Ziheng Jiao , Hongyuan Zhang , Xuelong Li
‹ Prev 1 2 3 10 Next ›