Related papers: Sample Efficient Reinforcement Learning by Automat…

Reward-Machine-Guided, Self-Paced Reinforcement Learning

Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in…

Machine Learning · Computer Science 2023-05-29 Cevahir Koprulu , Ufuk Topcu

Human-Inspired Framework to Accelerate Reinforcement Learning

Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency, particularly in real-world scenarios with costly physical interactions. This paper introduces a novel human-inspired framework to…

Machine Learning · Computer Science 2024-03-13 Ali Beikmohammadi , Sindri Magnússon

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we…

Computation and Language · Computer Science 2024-07-08 Fuxiang Zhang , Junyou Li , Yi-Chen Li , Zongzhang Zhang , Yang Yu , Deheng Ye

Efficient Reinforcement Learning for Unsupervised Controlled Text Generation

Controlled text generation tasks such as unsupervised text style transfer have increasingly adopted the use of Reinforcement Learning (RL). A major challenge in applying RL to such tasks is the sparse reward, which is available only after…

Computation and Language · Computer Science 2022-04-19 Bhargav Upadhyay , Akhilesh Sudhakar , Arjun Maheswaran

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications, but in practice the choice of reward function can be crucial for good results -- while in principle the reward only needs…

Machine Learning · Computer Science 2022-10-19 Abhishek Gupta , Aldo Pacchiano , Yuexiang Zhai , Sham M. Kakade , Sergey Levine

Hierarchical Reinforcement Learning with Hindsight

Reinforcement Learning (RL) algorithms can suffer from poor sample efficiency when rewards are delayed and sparse. We introduce a solution that enables agents to learn temporally extended actions at multiple levels of abstraction in a…

Machine Learning · Computer Science 2019-03-11 Andrew Levy , Robert Platt , Kate Saenko

Safe and Sample-efficient Reinforcement Learning for Clustered Dynamic Environments

This study proposes a safe and sample-efficient reinforcement learning (RL) framework to address two major challenges in developing applicable RL algorithms: satisfying safety constraints and efficiently learning with limited samples. To…

Machine Learning · Computer Science 2023-03-28 Hongyi Chen , Changliu Liu

Sample-Efficient Neurosymbolic Deep Reinforcement Learning

Reinforcement Learning (RL) is a well-established framework for sequential decision-making in complex environments. However, state-of-the-art Deep RL (DRL) algorithms typically require large training datasets and often struggle to…

Artificial Intelligence · Computer Science 2026-04-13 Celeste Veronese , Alessandro Farinelli , Daniele Meli

RAPID: An Efficient Reinforcement Learning Algorithm for Small Language Models

Reinforcement learning (RL) has emerged as a promising strategy for finetuning small language models (SLMs) to solve targeted tasks such as math and coding. However, RL algorithms tend to be resource-intensive, taking a significant amount…

Machine Learning · Computer Science 2025-10-07 Lianghuan Huang , Sagnik Anupam , Insup Lee , Shuo Li , Osbert Bastani

Match or Replay: Self Imitating Proximal Policy Optimization

Reinforcement Learning (RL) agents often struggle with inefficient exploration, particularly in environments with sparse rewards. Traditional exploration strategies can lead to slow learning and suboptimal performance because agents fail to…

Machine Learning · Computer Science 2026-03-31 Gaurav Chaudhary , Laxmidhar Behera , Washim Uddin Mondal

Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics and high-level control objectives specified as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace…

Robotics · Computer Science 2024-10-17 Yiannis Kantaros , Jun Wang

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires…

Computation and Language · Computer Science 2019-07-31 Yang Gao , Christian M. Meyer , Mohsen Mesgar , Iryna Gurevych

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

In reward-free reinforcement learning (RL), an agent explores the environment first without any reward information, in order to achieve certain learning goals afterwards for any given reward. In this paper we focus on reward-free RL under…

Machine Learning · Computer Science 2023-03-21 Yuan Cheng , Ruiquan Huang , Jing Yang , Yingbin Liang

Programmatic Reward Design by Example

Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of programmatic reward design,…

Machine Learning · Computer Science 2022-01-10 Weichao Zhou , Wenchao Li

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however,…

Machine Learning · Computer Science 2022-01-19 Rodrigo Toro Icarte , Toryn Q. Klassen , Richard Valenzano , Sheila A. McIlraith

Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration

Recent advancements in deep reinforcement learning (RL) have demonstrated notable progress in sample efficiency, spanning both model-based and model-free paradigms. Despite the identification and mitigation of specific bottlenecks in prior…

Machine Learning · Computer Science 2024-04-02 Yibo Wang , Jiang Zhao

Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

Reward shaping is a technique in reinforcement learning that addresses the sparse-reward problem by providing more frequent and informative rewards. We introduce a self-adaptive and highly efficient reward shaping mechanism that…

Machine Learning · Computer Science 2025-03-03 Haozhe Ma , Zhengding Luo , Thanh Vinh Vo , Kuankuan Sima , Tze-Yun Leong

On Reward-Free Reinforcement Learning with Linear Function Approximation

Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using…

Machine Learning · Computer Science 2020-06-22 Ruosong Wang , Simon S. Du , Lin F. Yang , Ruslan Salakhutdinov

Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

In the past decades, we have witnessed significant progress in the domain of autonomous driving. Advanced techniques based on optimization and reinforcement learning (RL) become increasingly powerful at solving the forward problem: given…

Robotics · Computer Science 2020-06-25 Zheng Wu , Liting Sun , Wei Zhan , Chenyu Yang , Masayoshi Tomizuka

To the Max: Reinventing Reward in Reinforcement Learning

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the…

Machine Learning · Computer Science 2025-02-25 Grigorii Veviurko , Wendelin Böhmer , Mathijs de Weerdt