Related papers: Contextual Exploration Using a Linear Approximatio…
The contextual bandit problem, which is a type of reinforcement learning tasks, provides an effective framework for solving challenges in recommendation systems, such as satisfying real-time requirements, enabling personalization,…
A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world,…
Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires…
As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a \textit{satisficing}…
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL.…
Deep reinforcement learning (DRL) has been proven its efficiency in capturing users' dynamic interests in recent literature. However, training a DRL agent is challenging, because of the sparse environment in recommender systems (RS), DRL…
Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal…
Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to…
We present a new model-based algorithm for reinforcement learning (RL) which consists of explicit exploration and exploitation phases, and is applicable in large or infinite state spaces. The algorithm maintains a set of dynamics models…
Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise, requiring efficient exploration coupled with long-horizon credit assignment, and overcoming these…
Reinforcement Learning (RL) traditionally relies on scalar reward signals, limiting its ability to leverage the rich semantic knowledge often available in real-world tasks. In contrast, humans learn efficiently by combining numerical…
Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical…
Efficient exploration remains a challenging research problem in reinforcement learning, especially when an environment contains large state spaces, deceptive local optima, or sparse rewards. To tackle this problem, we present a…
The objective of a reinforcement learning agent is to discover better actions through exploration. However, typical exploration techniques aim to maximize rewards, often incurring high costs in both exploration and learning processes. We…
A burgeoning area within reinforcement learning (RL) is the design of sequential decision-making agents centered around large language models (LLMs). While autonomous decision-making agents powered by modern LLMs could facilitate numerous…
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined…
Model-free continuous control for robot navigation tasks using Deep Reinforcement Learning (DRL) that relies on noisy policies for exploration is sensitive to the density of rewards. In practice, robots are usually deployed in cluttered…
Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies…
Language model alignment (or, reinforcement learning) techniques that leverage active exploration -- deliberately encouraging the model to produce diverse, informative responses -- offer the promise of super-human capabilities. However,…
Reinforcement learning (RL) and Deep Reinforcement Learning (DRL), in particular, have the potential to disrupt and are already changing the way we interact with the world. One of the key indicators of their applicability is their ability…