Related papers: Reinforcement Learning on Web Interfaces Using Wor…

Learning to Navigate the Web

Learning in environments with large state and action spaces, and sparse rewards, can hinder a Reinforcement Learning (RL) agent's learning through trial-and-error. For instance, following natural language instructions on the Web (such as…

Machine Learning · Computer Science 2018-12-24 Izzeddin Gur , Ulrich Rueckert , Aleksandra Faust , Dilek Hakkani-Tur

Overcoming Exploration in Reinforcement Learning with Demonstrations

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal…

Machine Learning · Computer Science 2018-02-27 Ashvin Nair , Bob McGrew , Marcin Andrychowicz , Wojciech Zaremba , Pieter Abbeel

Intrinsically Motivated Reinforcement Learning based Recommendation with Counterfactual Data Augmentation

Deep reinforcement learning (DRL) has been proven its efficiency in capturing users' dynamic interests in recent literature. However, training a DRL agent is challenging, because of the sparse environment in recommender systems (RS), DRL…

Information Retrieval · Computer Science 2022-09-20 Xiaocong Chen , Siyu Wang , Lina Yao , Lianyong Qi , Yong Li

Learning Diverse Policies with Soft Self-Generated Guidance

Reinforcement learning (RL) with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained. Hence, the gradient calculated by the agent can be stochastic and without valid information. Recent studies that…

Machine Learning · Computer Science 2024-02-08 Guojian Wang , Faguo Wu , Xiao Zhang , Jianxiang Liu

Exploration via Planning for Information about the Optimal Trajectory

Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or…

Machine Learning · Computer Science 2022-10-11 Viraj Mehta , Ian Char , Joseph Abbate , Rory Conlin , Mark D. Boyer , Stefano Ermon , Jeff Schneider , Willie Neiswanger

Computationally Efficient Reinforcement Learning: Targeted Exploration leveraging Simple Rules

Model-free Reinforcement Learning (RL) generally suffers from poor sample complexity, mostly due to the need to exhaustively explore the state-action space to find well-performing policies. On the other hand, we postulate that expert…

Machine Learning · Computer Science 2023-09-13 Loris Di Natale , Bratislav Svetozarevic , Philipp Heer , Colin N. Jones

How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1

Deep Research agents tackle knowledge-intensive tasks through multi-round retrieval and decision-oriented generation. While reinforcement learning (RL) has been shown to improve performance in this paradigm, its contributions remain…

Computation and Language · Computer Science 2026-02-24 Yinuo Xu , Shuo Lu , Jianjie Cheng , Meng Wang , Qianlong Xie , Xingxing Wang , Ran He , Jian Liang

Reinforcement Learning for Machine Learning Engineering Agents

Existing agents for solving tasks such as ML engineering rely on prompting powerful language models. As a result, these agents do not improve with more experience. In this paper, we show that agents backed by weaker models that improve via…

Machine Learning · Computer Science 2025-09-04 Sherry Yang , Joy He-Yueya , Percy Liang

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Reinforcement Learning (RL) enables an intelligent agent to optimise its performance in a task by continuously taking action from an observed state and receiving a feedback from the environment in form of rewards. RL typically uses tables…

Artificial Intelligence · Computer Science 2025-01-28 Alberto Castagna

Guiding Pretraining in Reinforcement Learning with Large Language Models

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions,…

Machine Learning · Computer Science 2023-09-18 Yuqing Du , Olivia Watkins , Zihan Wang , Cédric Colas , Trevor Darrell , Pieter Abbeel , Abhishek Gupta , Jacob Andreas

Guiding Exploration in Reinforcement Learning Through LLM-Augmented Observations

Reinforcement Learning (RL) agents often struggle in sparse-reward environments where traditional exploration strategies fail to discover effective action sequences. Large Language Models (LLMs) possess procedural knowledge and reasoning…

Machine Learning · Computer Science 2025-10-13 Vaibhav Jain , Gerrit Grossmann

A Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical Challenges

Reinforcement Learning (RL) has emerged as a powerful paradigm in Artificial Intelligence (AI), enabling agents to learn optimal behaviors through interactions with their environments. Drawing from the foundations of trial and error, RL…

Artificial Intelligence · Computer Science 2025-02-04 Majid Ghasemi , Amir Hossein Moosavi , Dariush Ebrahimi

Human-Inspired Framework to Accelerate Reinforcement Learning

Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency, particularly in real-world scenarios with costly physical interactions. This paper introduces a novel human-inspired framework to…

Machine Learning · Computer Science 2024-03-13 Ali Beikmohammadi , Sindri Magnússon

Reinforcement Learning with Lookahead Information

We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including…

Machine Learning · Computer Science 2024-10-22 Nadav Merlis

Reinforcement Learning

Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e.g., board games, video games or autonomous vehicles. In such problems, an agent faces a sequential decision-making…

Machine Learning · Computer Science 2020-06-16 Olivier Buffet , Olivier Pietquin , Paul Weng

Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language

Reinforcement learning is a powerful technique for learning from trial and error, but it often requires a large number of interactions to achieve good performance. In some domains, such as sparse-reward tasks, an oracle that can provide…

Artificial Intelligence · Computer Science 2023-09-22 Zhourui Guo , Meng Yao , Yang Yu , Qiyue Yin

Safe Reinforcement Learning with Minimal Supervision

Reinforcement learning (RL) in the real world necessitates the development of procedures that enable agents to explore without causing harm to themselves or others. The most successful solutions to the problem of safe RL leverage offline…

Machine Learning · Computer Science 2025-01-09 Alexander Quessy , Thomas Richardson , Sebastian East

An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents

Reinforcement learning (RL) has demonstrated strong potential in training large language models (LLMs) capable of complex reasoning for real-world problem solving. More recently, RL has been leveraged to create sophisticated LLM-based…

Computation and Language · Computer Science 2025-05-22 Bowen Jin , Jinsung Yoon , Priyanka Kargupta , Sercan O. Arik , Jiawei Han

Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration

Recent advancements in deep reinforcement learning (RL) have demonstrated notable progress in sample efficiency, spanning both model-based and model-free paradigms. Despite the identification and mitigation of specific bottlenecks in prior…

Machine Learning · Computer Science 2024-04-02 Yibo Wang , Jiang Zhao

Retrieval-Augmented Reinforcement Learning

Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive,…

Machine Learning · Computer Science 2022-05-25 Anirudh Goyal , Abram L. Friesen , Andrea Banino , Theophane Weber , Nan Rosemary Ke , Adria Puigdomenech Badia , Arthur Guez , Mehdi Mirza , Peter C. Humphreys , Ksenia Konyushkova , Laurent Sifre , Michal Valko , Simon Osindero , Timothy Lillicrap , Nicolas Heess , Charles Blundell