Related papers: Data Driven Reward Initialization for Preference b…

Direct Preference-based Policy Optimization without Reward Modeling

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a…

Machine Learning · Computer Science 2023-10-30 Gaon An , Junhyeok Lee , Xingdong Zuo , Norio Kosaka , Kyung-Min Kim , Hyun Oh Song

Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning

Preference Based Reinforcement Learning has shown much promise for utilizing human binary feedback on queried trajectory pairs to recover the underlying reward model of the Human in the Loop (HiL). While works have attempted to better…

Robotics · Computer Science 2023-02-20 Mudit Verma , Siddhant Bhambri , Subbarao Kambhampati

Advances in Preference-based Reinforcement Learning: A Review

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by…

Artificial Intelligence · Computer Science 2024-08-23 Youssef Abdelkareem , Shady Shehata , Fakhri Karray

Provable Reward-Agnostic Preference-Based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While PbRL has demonstrated…

Machine Learning · Computer Science 2024-04-18 Wenhao Zhan , Masatoshi Uehara , Wen Sun , Jason D. Lee

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a…

Machine Learning · Computer Science 2024-02-13 Yi Liu , Gaurav Datta , Ellen Novoseller , Daniel S. Brown

Preference-based Reinforcement Learning with Finite-Time Guarantees

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or…

Machine Learning · Computer Science 2020-10-27 Yichong Xu , Ruosong Wang , Lin F. Yang , Aarti Singh , Artur Dubrawski

PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) has emerged as a promising approach for learning behaviors from human feedback without predefined reward functions. However, current PbRL methods face a critical challenge in effectively…

Artificial Intelligence · Computer Science 2025-06-17 Brahim Driss , Alex Davey , Riad Akrour

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems…

Machine Learning · Computer Science 2024-05-30 Fengshuo Bai , Rui Zhao , Hongming Zhang , Sijia Cui , Ying Wen , Yaodong Yang , Bo Xu , Lei Han

Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

Specifying rewards for reinforcement learned (RL) agents is challenging. Preference-based RL (PbRL) mitigates these challenges by inferring a reward from feedback over sets of trajectories. However, the effectiveness of PbRL is limited by…

Machine Learning · Computer Science 2022-10-20 Mudit Verma , Katherine Metcalf

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample…

Artificial Intelligence · Computer Science 2024-02-29 Katherine Metcalf , Miguel Sarabia , Natalie Mackraz , Barry-John Theobald

Information Directed Reward Learning for Reinforcement Learning

For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate…

Machine Learning · Computer Science 2022-02-01 David Lindner , Matteo Turchetta , Sebastian Tschiatschek , Kamil Ciosek , Andreas Krause

Provable Offline Preference-Based Reinforcement Learning

In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our…

Machine Learning · Computer Science 2023-10-03 Wenhao Zhan , Masatoshi Uehara , Nathan Kallus , Jason D. Lee , Wen Sun

Residual Reward Models for Preference-based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from…

Machine Learning · Computer Science 2025-07-02 Chenyang Cao , Miguel Rogel-García , Mohamed Nabail , Xueqian Wang , Nicholas Rhinehart

Hindsight PRIORs for Reward Learning from Human Preferences

Preference based Reinforcement Learning (PbRL) removes the need to hand specify a reward function by learning a reward from preference feedback over policy behaviors. Current approaches to PbRL do not address the credit assignment problem…

Machine Learning · Computer Science 2024-04-16 Mudit Verma , Katherine Metcalf

Best Policy Learning from Trajectory Preference Feedback

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful approach for aligning generative models, but its reliance on learned reward models makes it vulnerable to mis-specification and reward hacking. Preference-based…

Machine Learning · Computer Science 2026-04-23 Akhil Agnihotri , Rahul Jain , Deepak Ramachandran , Zheng Wen

Inverse Preference Learning: Preference-based RL without a Reward Function

Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of…

Machine Learning · Computer Science 2023-11-28 Joey Hejna , Dorsa Sadigh

Tell me why: Training preferences-based RL with human preferences and step-level explanations

Human-in-the-loop reinforcement learning allows the training of agents through various interfaces, even for non-expert humans. Recently, preference-based methods (PbRL), where the human has to give his preference over two trajectories,…

Artificial Intelligence · Computer Science 2024-08-06 Jakob Karalus

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward engineering. Preference-based RL methods are able to learn a more flexible reward model based on human preferences by actively incorporating…

Machine Learning · Computer Science 2022-05-26 Xinran Liang , Katherine Shu , Kimin Lee , Pieter Abbeel

SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) methods provide a solution to avoid reward engineering by learning reward models based on human preferences. However, poor feedback- and sample- efficiency still remain the problems that hinder…

Robotics · Computer Science 2026-05-22 Hexian Ni , Tao Lu , Haoyuan Hu , Yinghao Cai , Shuo Wang

Preference-Guided Reinforcement Learning for Efficient Exploration

In this paper, we investigate preference-based reinforcement learning (PbRL), which enables reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable when defining a fine-grain reward function is not…

Machine Learning · Computer Science 2025-11-11 Guojian Wang , Jianxiang Liu , Xinyuan Li , Faguo Wu , Xiao Zhang , Tianyuan Chen , Xuyang Chen