Related papers: Sample-Efficient Preference-based Reinforcement Le…

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a…

Machine Learning · Computer Science 2024-02-13 Yi Liu , Gaurav Datta , Ellen Novoseller , Daniel S. Brown

Provable Reward-Agnostic Preference-Based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While PbRL has demonstrated…

Machine Learning · Computer Science 2024-04-18 Wenhao Zhan , Masatoshi Uehara , Wen Sun , Jason D. Lee

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems…

Machine Learning · Computer Science 2024-05-30 Fengshuo Bai , Rui Zhao , Hongming Zhang , Sijia Cui , Ying Wen , Yaodong Yang , Bo Xu , Lei Han

Direct Preference-based Policy Optimization without Reward Modeling

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a…

Machine Learning · Computer Science 2023-10-30 Gaon An , Junhyeok Lee , Xingdong Zuo , Norio Kosaka , Kyung-Min Kim , Hyun Oh Song

FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

Preference-based reinforcement learning (PbRL) is a suitable approach for style adaptation of pre-trained robotic behavior: adapting the robot's policy to follow human user preferences while still being able to perform the original task.…

Robotics · Computer Science 2025-04-15 Daniel Marta , Simon Holk , Miguel Vasco , Jens Lundell , Timon Homberger , Finn Busch , Olov Andersson , Danica Kragic , Iolanda Leite

Advances in Preference-based Reinforcement Learning: A Review

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by…

Artificial Intelligence · Computer Science 2024-08-23 Youssef Abdelkareem , Shady Shehata , Fakhri Karray

DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition

Preference-based Reinforcement Learning (PbRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human…

Robotics · Computer Science 2026-01-22 Yuki Kadokawa , Jonas Frey , Takahiro Miki , Takamitsu Matsubara , Marco Hutter

Personalization in Human-Robot Interaction through Preference-based Action Representation Learning

Preference-based reinforcement learning (PbRL) has shown significant promise for personalization in human-robot interaction (HRI) by explicitly integrating human preferences into the robot learning process. However, existing practices often…

Robotics · Computer Science 2025-03-12 Ruiqi Wang , Dezhong Zhao , Dayoon Suh , Ziqin Yuan , Guohua Chen , Byung-Cheol Min

Preference-based Reinforcement Learning with Finite-Time Guarantees

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or…

Machine Learning · Computer Science 2020-10-27 Yichong Xu , Ruosong Wang , Lin F. Yang , Aarti Singh , Artur Dubrawski

Hindsight PRIORs for Reward Learning from Human Preferences

Preference based Reinforcement Learning (PbRL) removes the need to hand specify a reward function by learning a reward from preference feedback over policy behaviors. Current approaches to PbRL do not address the credit assignment problem…

Machine Learning · Computer Science 2024-04-16 Mudit Verma , Katherine Metcalf

Residual Reward Models for Preference-based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from…

Machine Learning · Computer Science 2025-07-02 Chenyang Cao , Miguel Rogel-García , Mohamed Nabail , Xueqian Wang , Nicholas Rhinehart

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents' behavior with human desired outcomes, but is often restrained by costly human feedback. To improve feedback efficiency, most existing PbRL methods…

Machine Learning · Computer Science 2024-07-08 Xiao Hu , Jianxiong Li , Xianyuan Zhan , Qing-Shan Jia , Ya-Qin Zhang

PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) has emerged as a promising approach for learning behaviors from human feedback without predefined reward functions. However, current PbRL methods face a critical challenge in effectively…

Artificial Intelligence · Computer Science 2025-06-17 Brahim Driss , Alex Davey , Riad Akrour

Data Driven Reward Initialization for Preference based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) methods utilize binary feedback from the human in the loop (HiL) over queried trajectory pairs to learn a reward model in an attempt to approximate the human's underlying reward function…

Machine Learning · Computer Science 2023-02-20 Mudit Verma , Subbarao Kambhampati

Online Policy Learning from Offline Preferences

In preference-based reinforcement learning (PbRL), a reward function is learned from a type of human feedback called preference. To expedite preference collection, recent works have leveraged \emph{offline preferences}, which are…

Machine Learning · Computer Science 2024-03-18 Guoxi Zhang , Han Bao , Hisashi Kashima

Learning Acrobatic Flight from Preferences

Preference-based reinforcement learning (PbRL) enables agents to learn control policies without requiring manually designed reward functions, making it well-suited for tasks where objectives are difficult to formalize or inherently…

Robotics · Computer Science 2026-03-04 Colin Merk , Ismail Geles , Jiaxu Xing , Angel Romero , Giorgia Ramponi , Davide Scaramuzza

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward…

Machine Learning · Computer Science 2021-12-22 Tom Bewley , Freddy Lecue

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

Preference-based reinforcement learning (RL) algorithms help avoid the pitfalls of hand-crafted reward functions by distilling them from human preference feedback, but they remain impractical due to the burdensome number of labels required…

Machine Learning · Computer Science 2022-11-15 Katherine Metcalf , Miguel Sarabia , Barry-John Theobald

SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) methods provide a solution to avoid reward engineering by learning reward models based on human preferences. However, poor feedback- and sample- efficiency still remain the problems that hinder…

Robotics · Computer Science 2026-05-22 Hexian Ni , Tao Lu , Haoyuan Hu , Yinghao Cai , Shuo Wang

Preference-based Reinforcement Learning (PbRL) entails a variety of approaches for aligning models with human intent to alleviate the burden of reward engineering. However, most previous PbRL work has not investigated the robustness to…

Machine Learning · Computer Science 2025-06-17 Sara Rajaram , R. James Cotton , Fabian H. Sinz