Related papers: Experience Replay Optimization

Remember and Forget for Experience Replay

Experience replay (ER) is a fundamental component of off-policy deep reinforcement learning (RL). ER recalls experiences from past iterations to compute gradient estimates for the current policy, increasing data-efficiency. However, the…

Machine Learning · Computer Science 2019-05-21 Guido Novati , Petros Koumoutsakos

Reinforcement Learning in Economics and Finance

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal…

Theoretical Economics · Economics 2020-03-24 Arthur Charpentier , Romuald Elie , Carl Remlinger

Augmented Replay Memory in Reinforcement Learning With Continuous Control

Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent's…

Machine Learning · Computer Science 2021-02-04 Mirza Ramicic , Andrea Bonarini

CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms

The utilization of the experience replay mechanism enables agents to effectively leverage their experiences on several occasions. In previous studies, the sampling probability of the transitions was modified based on their relative…

Machine Learning · Computer Science 2024-06-14 Arda Sarp Yenicesu , Furkan B. Mutlu , Suleyman S. Kozat , Ozgur S. Oguz

Experience Replay Using Transition Sequences

Experience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate…

Artificial Intelligence · Computer Science 2022-09-29 Thommen George Karimpanal , Roland Bouffanais

Revisiting Experience Replayable Conditions

Experience replay (ER) used in (deep) reinforcement learning is considered to be applicable only to off-policy algorithms. However, there have been some cases in which ER has been applied for on-policy algorithms, suggesting that…

Machine Learning · Computer Science 2024-09-16 Taisuke Kobayashi

On-Policy Trust Region Policy Optimisation with Replay Buffers

Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies. On-policy methods bring many…

Machine Learning · Computer Science 2019-01-21 Dmitry Kangin , Nicolas Pugeault

Replay For Safety

Experience replay \citep{lin1993reinforcement, mnih2015human} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms. In experience replay, past transitions are stored in a memory buffer and…

Machine Learning · Computer Science 2021-12-09 Liran Szlak , Ohad Shamir

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

Training large language models (LLMs) as interactive agents for controlling graphical user interfaces (GUIs) presents a unique challenge to optimize long-horizon action sequences with multimodal feedback from complex environments. While…

Computer Vision and Pattern Recognition · Computer Science 2025-05-23 Fanbin Lu , Zhisheng Zhong , Shu Liu , Chi-Wing Fu , Jiaya Jia

The Effects of Memory Replay in Reinforcement Learning

Experience replay is a key technique behind many recent advances in deep reinforcement learning. Allowing the agent to learn from earlier memories can speed up learning and break undesirable temporal correlations. Despite its wide-spread…

Artificial Intelligence · Computer Science 2017-10-19 Ruishan Liu , James Zou

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Experience replay is crucial for off-policy reinforcement learning (RL) methods. By remembering and reusing the experiences from past different policies, experience replay significantly improves the training efficiency and stability of RL…

Machine Learning · Computer Science 2023-03-01 Yongsheng Mei , Hanhan Zhou , Tian Lan , Guru Venkataramani , Peng Wei

Reward Constrained Policy Optimization

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While…

Machine Learning · Computer Science 2018-12-27 Chen Tessler , Daniel J. Mankowitz , Shie Mannor

On-Policy RL with Optimal Reward Baseline

Reinforcement learning algorithms are fundamental to align large language models with human preferences and to enhance their reasoning capabilities. However, current reinforcement learning algorithms often suffer from training instability…

Machine Learning · Computer Science 2025-06-05 Yaru Hao , Li Dong , Xun Wu , Shaohan Huang , Zewen Chi , Furu Wei

OER: Offline Experience Replay for Continual Offline Reinforcement Learning

The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under…

Machine Learning · Computer Science 2024-04-23 Sibo Gai , Donglin Wang , Li He

Hindsight Experience Replay Accelerates Proximal Policy Optimization

Hindsight experience replay (HER) accelerates off-policy reinforcement learning algorithms for environments that emit sparse rewards by modifying the goal of the episode post-hoc to be some state achieved during the episode. Because…

Machine Learning · Computer Science 2024-10-31 Douglas C. Crowder , Darrien M. McKenzie , Matthew L. Trappett , Frances S. Chance

Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains…

Machine Learning · Computer Science 2022-07-28 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Match or Replay: Self Imitating Proximal Policy Optimization

Reinforcement Learning (RL) agents often struggle with inefficient exploration, particularly in environments with sparse rewards. Traditional exploration strategies can lead to slow learning and suboptimal performance because agents fail to…

Machine Learning · Computer Science 2026-03-31 Gaurav Chaudhary , Laxmidhar Behera , Washim Uddin Mondal

Variance Reduction based Experience Replay for Policy Optimization

For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to…

Machine Learning · Statistics 2022-09-13 Hua Zheng , Wei Xie , M. Ben Feng

Reflective Policy Optimization

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

Revisiting Prioritized Experience Replay: A Value Perspective

Experience replay enables off-policy reinforcement learning (RL) agents to utilize past experiences to maximize the cumulative reward. Prioritized experience replay that weighs experiences by the magnitude of their temporal-difference error…

Machine Learning · Computer Science 2021-02-08 Ang A. Li , Zongqing Lu , Chenglin Miao