Related papers: Trajectory-Oriented Policy Optimization with Spars…

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL.…

Machine Learning · Computer Science 2024-01-03 Guojian Wang , Faguo Wu , Xiao Zhang , Ning Guo , Zhiming Zheng

Learning Diverse Policies with Soft Self-Generated Guidance

Reinforcement learning (RL) with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained. Hence, the gradient calculated by the agent can be stochastic and without valid information. Recent studies that…

Machine Learning · Computer Science 2024-02-08 Guojian Wang , Faguo Wu , Xiao Zhang , Jianxiang Liu

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully.…

Machine Learning · Computer Science 2022-02-15 Desik Rengarajan , Gargi Vaidya , Akshay Sarvesh , Dileep Kalathil , Srinivas Shakkottai

Match or Replay: Self Imitating Proximal Policy Optimization

Reinforcement Learning (RL) agents often struggle with inefficient exploration, particularly in environments with sparse rewards. Traditional exploration strategies can lead to slow learning and suboptimal performance because agents fail to…

Machine Learning · Computer Science 2026-03-31 Gaurav Chaudhary , Laxmidhar Behera , Washim Uddin Mondal

Overcoming Exploration in Reinforcement Learning with Demonstrations

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal…

Machine Learning · Computer Science 2018-02-27 Ashvin Nair , Bob McGrew , Marcin Andrychowicz , Wojciech Zaremba , Pieter Abbeel

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes

Resource allocation plays a critical role in minimizing cycle time and improving the efficiency of business processes. Recently, Deep Reinforcement Learning (DRL) has emerged as a powerful technique to optimize resource allocation policies…

Machine Learning · Computer Science 2025-09-03 Jeroen Middelhuis , Zaharah Bukhsh , Ivo Adan , Remco Dijkman

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized offline demonstrations to achieve impressive results in multiple hard tasks. However, these…

Machine Learning · Computer Science 2024-10-28 Guojian Wang , Faguo Wu , Xiao Zhang , Tianyuan Chen

Optimal Transport for Offline Imitation Learning

With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to…

Machine Learning · Computer Science 2023-03-27 Yicheng Luo , Zhengyao Jiang , Samuel Cohen , Edward Grefenstette , Marc Peter Deisenroth

A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning

Deep Reinforcement Learning (DRL) is a promising approach for teaching robots new behaviour. However, one of its main limitations is the need for carefully hand-coded reward signals by an expert. We argue that it is crucial to automate the…

Robotics · Computer Science 2021-08-09 Abdalkarim Mohtasib , Gerhard Neumann , Heriberto Cuayahuitl

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to…

Machine Learning · Computer Science 2022-10-17 Ashish Kumar Jayant , Shalabh Bhatnagar

Provable Offline Preference-Based Reinforcement Learning

In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our…

Machine Learning · Computer Science 2023-10-03 Wenhao Zhan , Masatoshi Uehara , Nathan Kallus , Jason D. Lee , Wen Sun

CROP: Conservative Reward for Model-based Offline Policy Optimization

Offline reinforcement learning (RL) aims to optimize a policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges because of their capability to mitigate…

Machine Learning · Computer Science 2026-04-14 Hao Li , Xiao-Hu Zhou , Shu-Hai Li , Mei-Jiang Gui , Xiao-Liang Xie , Shi-Qi Liu , Shuang-Yi Wang , Zhen-Qiu Feng , Zeng-Guang Hou

Offline Primal-Dual Reinforcement Learning for Linear MDPs

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy. This problem has attracted a lot of attention recently, but most existing methods with strong…

Machine Learning · Computer Science 2023-05-23 Germano Gabbianelli , Gergely Neu , Nneka Okolo , Matteo Papini

Offline Safe Reinforcement Learning Using Trajectory Classification

Offline safe reinforcement learning (RL) has emerged as a promising approach for learning safe behaviors without engaging in risky online interactions with the environment. Most existing methods in offline safe RL rely on cost constraints…

Machine Learning · Computer Science 2025-04-22 Ze Gong , Akshat Kumar , Pradeep Varakantham

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

Offline preference-based reinforcement learning (PbRL) typically operates in two phases: first, use human preferences to learn a reward model and annotate rewards for a reward-free offline dataset; second, learn a policy by optimizing the…

Artificial Intelligence · Computer Science 2024-12-24 Songjun Tu , Jingbo Sun , Qichao Zhang , Yaocheng Zhang , Jia Liu , Ke Chen , Dongbin Zhao

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair. In many real world applications, however, an agent can observe only a score that represents the quality of the…

Machine Learning · Computer Science 2023-04-20 Tengyu Xu , Yue Wang , Shaofeng Zou , Yingbin Liang

Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution…

Machine Learning · Computer Science 2023-10-31 Kishan Panaganti , Zaiyan Xu , Dileep Kalathil , Mohammad Ghavamzadeh

A Risk-Sensitive Approach to Policy Optimization

Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy. This differs from human decision-making, where gains and losses are valued differently and…

Machine Learning · Computer Science 2023-11-17 Jared Markowitz , Ryan W. Gardner , Ashley Llorens , Raman Arora , I-Jeng Wang

Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning

Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any…

Robotics · Computer Science 2019-10-23 Guillaume Bellegarda , Katie Byl

Reward-Free Exploration for Reinforcement Learning

Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new…

Machine Learning · Computer Science 2020-02-10 Chi Jin , Akshay Krishnamurthy , Max Simchowitz , Tiancheng Yu