Related papers: Augmenting Unsupervised Reinforcement Learning wit…
While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an…
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination…
Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive,…
Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning. But such a sequential transfer learning paradigm often confronts the catastrophic forgetting problem and leads to sub-optimal…
Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However,…
Reinforcement learning (RL) is a branch of machine learning which is employed to solve various sequential decision making problems without proper supervision. Due to the recent advancement of deep learning, the newly proposed Deep-RL…
Preference-based reinforcement learning (RL) has shown potential for teaching agents to perform the target tasks without a costly, pre-defined reward function by learning the reward with a supervisor's preference between the two agent…
Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that…
Standard reinforcement learning (RL) for large language model (LLM) agents typically optimizes extrinsic rewards, prioritizing isolated task completion over continual adaptation. Consequently, agents often converge to suboptimal policies…
In an unfamiliar setting, a model-based reinforcement learning agent can be limited by the accuracy of its world model. In this work, we present a novel, training-free approach to improving the performance of such agents separately from…
The popularity of Large Language Models (LLMs) have unleashed a new age ofLanguage Agents for solving a diverse range of tasks. While contemporary frontier LLMs are capable enough to power reasonably good Language agents, the closed-API…
While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning. We present…
Reinforcement learning (RL) requires skillful definition and remarkable computational efforts to solve optimization and control problems, which could impair its prospect. Introducing human guidance into reinforcement learning is a promising…
Meta-learning algorithms use past experience to learn to quickly solve new tasks. In the context of reinforcement learning, meta-learning algorithms acquire reinforcement learning procedures to solve new problems more efficiently by…
In reinforcement learning (RL), experience replay-based sampling techniques play a crucial role in promoting convergence by eliminating spurious correlations. However, widely used methods such as uniform experience replay (UER) and…
Reinforcement learning agents learn by encouraging behaviours which maximize their total reward, usually provided by the environment. In many environments, however, the reward is provided after a series of actions rather than each single…
In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on a set of unlabeled data. The aim is to build useful representations that can be used in downstream tasks, without costly…
Modern large language models (LLMs) are often evaluated and deployed under a one-shot, greedy inference protocol, especially in professional settings that require deterministic behavior. This regime can systematically under-estimate a fixed…
Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent's…
As machine learning has moved towards leveraging large models as priors for downstream tasks, the community has debated the right form of prior for solving reinforcement learning (RL) problems. If one were to try to prefetch as much…