English
Related papers

Related papers: Evolutionary Policy Optimization

200 papers

A key challenge in reinforcement learning (RL) is managing the exploration-exploitation trade-off without sacrificing sample efficiency. Policy gradient (PG) methods excel in exploitation through fine-grained, gradient-based optimization…

Machine Learning · Computer Science 2025-04-18 Zelal Su "Lain" Mustafaoglu , Keshav Pingali , Risto Miikkulainen

Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the…

Machine Learning · Computer Science 2024-10-29 Sheryl Paul , Jyotirmoy V. Deshmukh

Safe reinforcement learning (safe RL) aims to respect safety requirements while optimizing long-term performance. In many practical applications, however, the problem involves an infinite number of constraints, known as semi-infinite safe…

Machine Learning · Computer Science 2025-11-07 Jiaming Zhang , Yujie Yang , Haoning Wang , Liping Zhang , Shengbo Eben Li

It is challenging for reinforcement learning (RL) algorithms to succeed in real-world applications like financial trading and logistic system due to the noisy observation and environment shifting between training and evaluation. Thus, it…

Machine Learning · Computer Science 2022-05-20 Zhengyu Yang , Kan Ren , Xufang Luo , Minghuan Liu , Weiqing Liu , Jiang Bian , Weinan Zhang , Dongsheng Li

The increasing importance of robots and automation creates a demand for learnable controllers which can be obtained through various approaches such as Evolutionary Algorithms (EAs) or Reinforcement Learning (RL). Unfortunately, these two…

Artificial Intelligence · Computer Science 2020-09-22 Szymon Brych , Antoine Cully

In this paper we analyze the qualitative differences between evolutionary strategies and reinforcement learning algorithms by focusing on two popular state-of-the-art algorithms: the OpenAI-ES evolutionary strategy and the Proximal Policy…

Artificial Intelligence · Computer Science 2022-05-17 Nicola Milano , Stefano Nolfi

Recently, Agentic Reinforcement Learning (Agentic RL) has made significant progress in incentivizing the multi-turn, long-horizon tool-use capabilities of web agents. While mainstream agentic RL algorithms autonomously explore…

Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business…

Computation and Language · Computer Science 2025-05-29 Xiaoqian Liu , Ke Wang , Yongbin Li , Yuchuan Wu , Wentao Ma , Aobo Kong , Fei Huang , Jianbin Jiao , Junge Zhang

Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack…

Machine Learning · Computer Science 2018-10-30 Shauharda Khadka , Kagan Tumer

Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithms (EA) are two major paradigms of policy optimization with distinct learning principles, i.e., gradient-based v.s. gradient-free. An appealing research direction is integrating…

Neural and Evolutionary Computing · Computer Science 2023-07-03 Jianye Hao , Pengyi Li , Hongyao Tang , Yan Zheng , Xian Fu , Zhaopeng Meng

Training LLM agents in multi-turn environments with sparse rewards, where completing a single task requires 30+ turns of interaction within an episode, presents a fundamental challenge for reinforcement learning. We identify a critical…

Machine Learning · Computer Science 2026-02-11 Wujiang Xu , Wentian Zhao , Zhenting Wang , Yu-Jhe Li , Can Jin , Mingyu Jin , Kai Mei , Kun Wan , Dimitris N. Metaxas

The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of…

Machine Learning · Computer Science 2022-12-16 Md Masudur Rahman , Yexiang Xue

Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose…

Machine Learning · Computer Science 2026-03-09 Zeyuan Liu , Jeonghye Kim , Xufang Luo , Dongsheng Li , Yuqing Yang

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm known for its stability and sample efficiency, but it often suffers from premature convergence due to limited exploration. In this paper, we propose POEM…

Neural and Evolutionary Computing · Computer Science 2026-01-22 Casimir Czworkowski , Stephen Hornish , Alhassan S. Yasin

Large language models (LLMs) have recently advanced in reasoning when optimized with reinforcement learning (RL) under verifiable rewards. Existing methods primarily rely on outcome-based supervision to strengthen internal LLM reasoning,…

Artificial Intelligence · Computer Science 2026-05-29 Siyao Song , Cong Ma , Zhihao Cheng , Shiye Lei , Minghao Li , Ying Zeng , Huaixiao Tou , Kai Jia

The policy represented by the deep neural network can overfit the spurious features in observations, which hamper a reinforcement learning agent from learning effective policy. This issue becomes severe in high-dimensional state, where the…

Machine Learning · Computer Science 2023-05-01 Md Masudur Rahman , Yexiang Xue

Reinforcement learning algorithms are fundamental to align large language models with human preferences and to enhance their reasoning capabilities. However, current reinforcement learning algorithms often suffer from training instability…

Machine Learning · Computer Science 2025-06-05 Yaru Hao , Li Dong , Xun Wu , Shaohan Huang , Zewen Chi , Furu Wei

Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most…

Machine Learning · Computer Science 2026-02-11 Hanyong Wang , Menglong Yang

Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However,…

Artificial Intelligence · Computer Science 2024-11-01 Matthew V Macfarlane , Edan Toledo , Donal Byrne , Paul Duckworth , Alexandre Laterre

While Reinforcement Learning (RL) has advanced LLM reasoning, applying it to long-context scenarios is hindered by sparsity of outcome rewards. This limitation fails to penalize ungrounded "lucky guesses," leaving the critical process of…

Artificial Intelligence · Computer Science 2026-04-21 Xin Guan , Zijian Li , Shen Huang , Pengjun Xie , Jingren Zhou , Jiuxin Cao
‹ Prev 1 2 3 10 Next ›