Related papers: Evolutionary Policy Optimization

Evolutionary Policy Optimization

A key challenge in reinforcement learning (RL) is managing the exploration-exploitation trade-off without sacrificing sample efficiency. Policy gradient (PG) methods excel in exploitation through fine-grained, gradient-based optimization…

Machine Learning · Computer Science 2025-04-18 Zelal Su "Lain" Mustafaoglu , Keshav Pingali , Risto Miikkulainen

Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts

Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the…

Machine Learning · Computer Science 2024-10-29 Sheryl Paul , Jyotirmoy V. Deshmukh

Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning

Safe reinforcement learning (safe RL) aims to respect safety requirements while optimizing long-term performance. In many practical applications, however, the problem involves an infinite number of constraints, known as semi-infinite safe…

Machine Learning · Computer Science 2025-11-07 Jiaming Zhang , Yujie Yang , Haoning Wang , Liping Zhang , Shengbo Eben Li

Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble

It is challenging for reinforcement learning (RL) algorithms to succeed in real-world applications like financial trading and logistic system due to the noisy observation and environment shifting between training and evaluation. Thus, it…

Machine Learning · Computer Science 2022-05-20 Zhengyu Yang , Kan Ren , Xufang Luo , Minghuan Liu , Weiqing Liu , Jiang Bian , Weinan Zhang , Dongsheng Li

Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulations

The increasing importance of robots and automation creates a demand for learnable controllers which can be obtained through various approaches such as Evolutionary Algorithms (EAs) or Reinforcement Learning (RL). Unfortunately, these two…

Artificial Intelligence · Computer Science 2020-09-22 Szymon Brych , Antoine Cully

Qualitative Differences Between Evolutionary Strategies and Reinforcement Learning Methods for Control of Autonomous Agents

In this paper we analyze the qualitative differences between evolutionary strategies and reinforcement learning algorithms by focusing on two popular state-of-the-art algorithms: the OpenAI-ES evolutionary strategy and the Proximal Policy…

Artificial Intelligence · Computer Science 2022-05-17 Nicola Milano , Stefano Nolfi

Agentic Entropy-Balanced Policy Optimization

Recently, Agentic Reinforcement Learning (Agentic RL) has made significant progress in incentivizing the multi-turn, long-horizon tool-use capabilities of web agents. While mainstream agentic RL algorithms autonomously explore…

Machine Learning · Computer Science 2025-10-17 Guanting Dong , Licheng Bao , Zhongyuan Wang , Kangzhi Zhao , Xiaoxi Li , Jiajie Jin , Jinghan Yang , Hangyu Mao , Fuzheng Zhang , Kun Gai , Guorui Zhou , Yutao Zhu , Ji-Rong Wen , Zhicheng Dou

EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business…

Computation and Language · Computer Science 2025-05-29 Xiaoqian Liu , Ke Wang , Yongbin Li , Yuchuan Wu , Wentao Ma , Aobo Kong , Fei Huang , Jianbin Jiao , Junge Zhang

Evolution-Guided Policy Gradient in Reinforcement Learning

Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack…

Machine Learning · Computer Science 2018-10-30 Shauharda Khadka , Kagan Tumer

ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithms (EA) are two major paradigms of policy optimization with distinct learning principles, i.e., gradient-based v.s. gradient-free. An appealing research direction is integrating…

Neural and Evolutionary Computing · Computer Science 2023-07-03 Jianye Hao , Pengyi Li , Hongyao Tang , Yan Zheng , Xian Fu , Zhaopeng Meng

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Training LLM agents in multi-turn environments with sparse rewards, where completing a single task requires 30+ turns of interaction within an episode, presents a fundamental challenge for reinforcement learning. We identify a critical…

Machine Learning · Computer Science 2026-02-11 Wujiang Xu , Wentian Zhao , Zhenting Wang , Yu-Jhe Li , Can Jin , Mingyu Jin , Kai Mei , Kun Wan , Dimitris N. Metaxas

Robust Policy Optimization in Deep Reinforcement Learning

The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of…

Machine Learning · Computer Science 2022-12-16 Md Masudur Rahman , Yexiang Xue

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose…

Machine Learning · Computer Science 2026-03-09 Zeyuan Liu , Jeonghye Kim , Xufang Luo , Dongsheng Li , Yuqing Yang

Proximal Policy Optimization with Evolutionary Mutations

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm known for its stability and sample efficiency, but it often suffers from premature convergence due to limited exploration. In this paper, we propose POEM…

Neural and Evolutionary Computing · Computer Science 2026-01-22 Casimir Czworkowski , Stephen Hornish , Alhassan S. Yasin

EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance

Large language models (LLMs) have recently advanced in reasoning when optimized with reinforcement learning (RL) under verifiable rewards. Existing methods primarily rely on outcome-based supervision to strengthen internal LLM reasoning,…

Artificial Intelligence · Computer Science 2026-05-29 Siyao Song , Cong Ma , Zhihao Cheng , Shiye Lei , Minghao Li , Ying Zeng , Huaixiao Tou , Kai Jia

Adversarial Policy Optimization in Deep Reinforcement Learning

The policy represented by the deep neural network can overfit the spurious features in observations, which hamper a reinforcement learning agent from learning effective policy. This issue becomes severe in high-dimensional state, where the…

Machine Learning · Computer Science 2023-05-01 Md Masudur Rahman , Yexiang Xue

On-Policy RL with Optimal Reward Baseline

Reinforcement learning algorithms are fundamental to align large language models with human preferences and to enhance their reasoning capabilities. However, current reinforcement learning algorithms often suffer from training instability…

Machine Learning · Computer Science 2025-06-05 Yaru Hao , Li Dong , Xun Wu , Shaohan Huang , Zewen Chi , Furu Wei

ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm

Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most…

Machine Learning · Computer Science 2026-02-11 Hanyong Wang , Menglong Yang

SPO: Sequential Monte Carlo Policy Optimisation

Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However,…

Artificial Intelligence · Computer Science 2024-11-01 Matthew V Macfarlane , Edan Toledo , Donal Byrne , Paul Duckworth , Alexandre Laterre

Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning

While Reinforcement Learning (RL) has advanced LLM reasoning, applying it to long-context scenarios is hindered by sparsity of outcome rewards. This limitation fails to penalize ungrounded "lucky guesses," leaving the critical process of…

Artificial Intelligence · Computer Science 2026-04-21 Xin Guan , Zijian Li , Shen Huang , Pengjun Xie , Jingren Zhou , Jiuxin Cao