Related papers: Offline Preference-Based Apprenticeship Learning

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2023-01-05 Daniel Shin , Anca D. Dragan , Daniel S. Brown

Preference Elicitation for Offline Reinforcement Learning

Applying reinforcement learning (RL) to real-world problems is often made challenging by the inability to interact with the environment and the difficulty of designing reward functions. Offline RL addresses the first challenge by…

Machine Learning · Computer Science 2025-03-03 Alizée Pace , Bernhard Schölkopf , Gunnar Rätsch , Giorgia Ramponi

Online Policy Learning from Offline Preferences

In preference-based reinforcement learning (PbRL), a reward function is learned from a type of human feedback called preference. To expedite preference collection, recent works have leveraged \emph{offline preferences}, which are…

Machine Learning · Computer Science 2024-03-18 Guoxi Zhang , Han Bao , Hisashi Kashima

Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications.…

Machine Learning · Computer Science 2024-07-08 Chen-Xiao Gao , Shengjun Fang , Chenjun Xiao , Yang Yu , Zongzhang Zhang

Offline Safe Policy Optimization From Heterogeneous Feedback

Offline Preference-based Reinforcement Learning (PbRL) learns rewards and policies aligned with human preferences without the need for extensive reward engineering and direct interaction with human annotators. However, ensuring safety…

Artificial Intelligence · Computer Science 2025-12-24 Ze Gong , Pradeep Varakantham , Akshat Kumar

Beyond Reward: Offline Preference-guided Policy Optimization

This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a variant of conventional reinforcement learning that dispenses with the need for online interaction or specification of reward functions. Instead,…

Machine Learning · Computer Science 2023-06-12 Yachen Kang , Diyuan Shi , Jinxin Liu , Li He , Donglin Wang

Provable Offline Preference-Based Reinforcement Learning

In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our…

Machine Learning · Computer Science 2023-10-03 Wenhao Zhan , Masatoshi Uehara , Nathan Kallus , Jason D. Lee , Wen Sun

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected…

Machine Learning · Computer Science 2023-03-15 Han Zheng , Xufang Luo , Pengfei Wei , Xuan Song , Dongsheng Li , Jing Jiang

Boosting Offline Reinforcement Learning with Action Preference Query

Training practical agents usually involve offline and online reinforcement learning (RL) to balance the policy's performance and interaction costs. In particular, online fine-tuning has become a commonly used method to correct the erroneous…

Machine Learning · Computer Science 2023-06-07 Qisen Yang , Shenzhi Wang , Matthieu Gaetan Lin , Shiji Song , Gao Huang

Offline Reinforcement Learning as Anti-Exploration

Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the…

Machine Learning · Computer Science 2021-06-14 Shideh Rezaeifar , Robert Dadashi , Nino Vieillard , Léonard Hussenot , Olivier Bachem , Olivier Pietquin , Matthieu Geist

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

In recent years, significant progress has been made in multi-objective reinforcement learning (RL) research, which aims to balance multiple objectives by incorporating preferences for each objective. In most existing studies, specific…

Machine Learning · Computer Science 2024-09-17 Qian Lin , Zongkai Liu , Danying Mo , Chao Yu

Direct Preference-based Policy Optimization without Reward Modeling

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a…

Machine Learning · Computer Science 2023-10-30 Gaon An , Junhyeok Lee , Xingdong Zuo , Norio Kosaka , Kyung-Min Kim , Hyun Oh Song

LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency

Offline preference-based reinforcement learning (PbRL) provides an effective way to overcome the challenges of designing reward and the high costs of online interaction. However, since labeling preference needs real-time human feedback,…

Machine Learning · Computer Science 2026-02-10 Xiao-Yin Liu , Guotao Li , Xiao-Hu Zhou , Zeng-Guang Hou

Leveraging Offline Data in Online Reinforcement Learning

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an…

Machine Learning · Computer Science 2023-07-21 Andrew Wagenmaker , Aldo Pacchiano

Reinforcement Learning from Diverse Human Preferences

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new…

Machine Learning · Computer Science 2024-05-09 Wanqi Xue , Bo An , Shuicheng Yan , Zhongwen Xu

Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Offline reinforcement learning has become one of the most practical RL settings. However, most existing works on offline RL focus on the standard setting with scalar reward feedback. It remains unknown how to universally transfer the…

Machine Learning · Computer Science 2024-10-25 Yinglun Xu , David Zhu , Rohan Gumaste , Gagandeep Singh

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for…

Machine Learning · Computer Science 2024-05-30 Tianle Zhang , Jiayi Guan , Lin Zhao , Yihang Li , Dongjiang Li , Zecui Zeng , Lei Sun , Yue Chen , Xuelong Wei , Lusong Li , Xiaodong He

The Challenges of Exploration for Offline Reinforcement Learning

Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the…

Machine Learning · Computer Science 2022-02-22 Nathan Lambert , Markus Wulfmeier , William Whitney , Arunkumar Byravan , Michael Bloesch , Vibhavari Dasagi , Tim Hertweck , Martin Riedmiller

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in various real-world applications. However, obtaining human feedback for preferences…

Machine Learning · Computer Science 2026-04-06 Yiqin Yang , Hao Hu , Yihuan Mao , Jin Zhang , Chengjie Wu , Yuhua Jiang , Xu Yang , Runpeng Xie , Yi Fan , Bo Liu , Yang Gao , Bo Xu , Chongjie Zhang

Offline Reinforcement Learning Hands-On

Offline Reinforcement Learning (RL) aims to turn large datasets into powerful decision-making engines without any online interactions with the environment. This great promise has motivated a large amount of research that hopes to replicate…

Machine Learning · Computer Science 2020-12-01 Louis Monier , Jakub Kmec , Alexandre Laterre , Thomas Pierrot , Valentin Courgeau , Olivier Sigaud , Karim Beguir