Related papers: Memory Augmented Self-Play

Augmented Replay Memory in Reinforcement Learning With Continuous Control

Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent's…

Machine Learning · Computer Science 2021-02-04 Mirza Ramicic , Andrea Bonarini

A Survey on Self-play Methods in Reinforcement Learning

Self-play, a learning paradigm where agents iteratively refine their policies by interacting with historical or concurrent versions of themselves or other evolving agents, has shown remarkable success in solving complex non-cooperative…

Artificial Intelligence · Computer Science 2025-10-21 Ruize Zhang , Zelai Xu , Chengdong Ma , Chao Yu , Wei-Wei Tu , Wenhao Tang , Shiyu Huang , Deheng Ye , Wenbo Ding , Yaodong Yang , Yu Wang

Learning To Explore With Predictive World Model Via Self-Supervised Learning

Autonomous artificial agents must be able to learn behaviors in complex environments without humans to design tasks and rewards. Designing these functions for each environment is not feasible, thus, motivating the development of intrinsic…

Machine Learning · Computer Science 2025-02-20 Alana Santana , Paula P. Costa , Esther L. Colombini

Brain-Like Replay Naturally Emerges in Reinforcement Learning Agents

Replay is a powerful strategy to promote learning in artificial intelligence and the brain. However, the conditions to generate it and its functional advantages have not been fully recognized. In this study, we develop a modular…

Systems and Control · Electrical Eng. & Systems 2024-10-08 Jiyi Wang , Likai Tang , Huimiao Chen , Marcelo G Mattar , Sen Song

Reinforcement Learning with Unsupervised Auxiliary Tasks

Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that…

Machine Learning · Computer Science 2016-11-17 Max Jaderberg , Volodymyr Mnih , Wojciech Marian Czarnecki , Tom Schaul , Joel Z Leibo , David Silver , Koray Kavukcuoglu

Self Punishment and Reward Backfill for Deep Q-Learning

Reinforcement learning agents learn by encouraging behaviours which maximize their total reward, usually provided by the environment. In many environments, however, the reward is provided after a series of actions rather than each single…

Artificial Intelligence · Computer Science 2022-01-04 Mohammad Reza Bonyadi , Rui Wang , Maryam Ziaei

Truthful Self-Play

We present a general framework for evolutionary learning to emergent unbiased state representation without any supervision. Evolutionary frameworks such as self-play converge to bad local optima in case of multi-agent reinforcement learning…

Machine Learning · Statistics 2023-02-03 Shohei Ohsawa

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and…

Machine Learning · Computer Science 2018-05-01 Sainbayar Sukhbaatar , Zeming Lin , Ilya Kostrikov , Gabriel Synnaeve , Arthur Szlam , Rob Fergus

Experience Replay Optimization

Experience replay enables reinforcement learning agents to memorize and reuse past experiences, just as humans replay memories for the situation at hand. Contemporary off-policy algorithms either replay past experiences uniformly or utilize…

Machine Learning · Computer Science 2019-06-21 Daochen Zha , Kwei-Herng Lai , Kaixiong Zhou , Xia Hu

Language Self-Play For Data-Free Training

Large language models (LLMs) have advanced rapidly in recent years, driven by scale, abundant high-quality training data, and reinforcement learning. Yet this progress faces a fundamental bottleneck: the need for ever more data from which…

Artificial Intelligence · Computer Science 2025-12-22 Jakub Grudzien Kuba , Mengting Gu , Qi Ma , Yuandong Tian , Vijai Mohan , Jason Chen

Survey of Self-Play in Reinforcement Learning

In reinforcement learning (RL), the term self-play describes a kind of multi-agent learning (MAL) that deploys an algorithm against copies of itself to test compatibility in various stochastic environments. As is typical in MAL, the…

Computer Science and Game Theory · Computer Science 2021-07-08 Anthony DiGiovanni , Ethan C. Zell

Reinforcement Learning in R

Reinforcement learning refers to a group of methods from artificial intelligence where an agent performs learning through trial and error. It differs from supervised learning, since reinforcement learning requires no explicit labels;…

Machine Learning · Computer Science 2018-10-02 Nicolas Pröllochs , Stefan Feuerriegel

Learning offline: memory replay in biological and artificial reinforcement learning

Learning to act in an environment to maximise rewards is among the brain's key functions. This process has often been conceptualised within the framework of reinforcement learning, which has also gained prominence in machine learning and…

Machine Learning · Computer Science 2021-09-22 Emma L. Roscow , Raymond Chua , Rui Ponte Costa , Matt W. Jones , Nathan Lepora

Explaining Agent's Decision-making in a Hierarchical Reinforcement Learning Scenario

Reinforcement learning is a machine learning approach based on behavioral psychology. It is focused on learning agents that can acquire knowledge and learn to carry out new tasks by interacting with the environment. However, a problem…

Artificial Intelligence · Computer Science 2022-12-15 Hugo Muñoz , Ernesto Portugal , Angel Ayala , Bruno Fernandes , Francisco Cruz

Loss is its own Reward: Self-Supervision for Reinforcement Learning

Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment…

Machine Learning · Computer Science 2017-03-10 Evan Shelhamer , Parsa Mahmoudieh , Max Argus , Trevor Darrell

Meta-Learning to Explore via Memory Density Feedback

Exploration algorithms for reinforcement learning typically replace or augment the reward function with an additional ``intrinsic'' reward that trains the agent to seek previously unseen states of the environment. Here, we consider an…

Machine Learning · Computer Science 2025-09-30 Kevin McKee , Eric Alt , Andrew Grebenisan , Mick van Gelderen , Gary Miguel

Continual Learning of Control Primitives: Skill Discovery via Reset-Games

Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed. First, in real world settings, when…

Machine Learning · Computer Science 2020-11-11 Kelvin Xu , Siddharth Verma , Chelsea Finn , Sergey Levine

Emergent Complexity via Multi-Agent Competition

Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly…

Artificial Intelligence · Computer Science 2018-03-16 Trapit Bansal , Jakub Pachocki , Szymon Sidor , Ilya Sutskever , Igor Mordatch

Unsupervised Control Through Non-Parametric Discriminative Rewards

Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning algorithm to train agents to achieve…

Machine Learning · Computer Science 2018-11-29 David Warde-Farley , Tom Van de Wiele , Tejas Kulkarni , Catalin Ionescu , Steven Hansen , Volodymyr Mnih

Extending Environments To Measure Self-Reflection In Reinforcement Learning

We consider an extended notion of reinforcement learning in which the environment can simulate the agent and base its outputs on the agent's hypothetical behavior. Since good performance usually requires paying attention to whatever things…

Artificial Intelligence · Computer Science 2022-07-21 Samuel Allen Alexander , Michael Castaneda , Kevin Compher , Oscar Martinez