Related papers: Efficient Preference-Based Reinforcement Learning …

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample…

Artificial Intelligence · Computer Science 2024-02-29 Katherine Metcalf , Miguel Sarabia , Natalie Mackraz , Barry-John Theobald

Direct Preference-based Policy Optimization without Reward Modeling

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a…

Machine Learning · Computer Science 2023-10-30 Gaon An , Junhyeok Lee , Xingdong Zuo , Norio Kosaka , Kyung-Min Kim , Hyun Oh Song

Advances in Preference-based Reinforcement Learning: A Review

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by…

Artificial Intelligence · Computer Science 2024-08-23 Youssef Abdelkareem , Shady Shehata , Fakhri Karray

Provable Reward-Agnostic Preference-Based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While PbRL has demonstrated…

Machine Learning · Computer Science 2024-04-18 Wenhao Zhan , Masatoshi Uehara , Wen Sun , Jason D. Lee

Residual Reward Models for Preference-based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify, avoiding heuristic and time-consuming reward design. However, PbRL can suffer from…

Machine Learning · Computer Science 2025-07-02 Chenyang Cao , Miguel Rogel-García , Mohamed Nabail , Xueqian Wang , Nicholas Rhinehart

Preference-based Reinforcement Learning with Finite-Time Guarantees

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or…

Machine Learning · Computer Science 2020-10-27 Yichong Xu , Ruosong Wang , Lin F. Yang , Aarti Singh , Artur Dubrawski

Personalization in Human-Robot Interaction through Preference-based Action Representation Learning

Preference-based reinforcement learning (PbRL) has shown significant promise for personalization in human-robot interaction (HRI) by explicitly integrating human preferences into the robot learning process. However, existing practices often…

Robotics · Computer Science 2025-03-12 Ruiqi Wang , Dezhong Zhao , Dayoon Suh , Ziqin Yuan , Guohua Chen , Byung-Cheol Min

FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

Preference-based reinforcement learning (PbRL) is a suitable approach for style adaptation of pre-trained robotic behavior: adapting the robot's policy to follow human user preferences while still being able to perform the original task.…

Robotics · Computer Science 2025-04-15 Daniel Marta , Simon Holk , Miguel Vasco , Jens Lundell , Timon Homberger , Finn Busch , Olov Andersson , Danica Kragic , Iolanda Leite

Physics-Informed Model-Based Reinforcement Learning

We apply reinforcement learning (RL) to robotics tasks. One of the drawbacks of traditional RL algorithms has been their poor sample efficiency. One approach to improve the sample efficiency is model-based RL. In our model-based RL…

Machine Learning · Computer Science 2023-05-16 Adithya Ramesh , Balaraman Ravindran

PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) has emerged as a promising approach for learning behaviors from human feedback without predefined reward functions. However, current PbRL methods face a critical challenge in effectively…

Artificial Intelligence · Computer Science 2025-06-17 Brahim Driss , Alex Davey , Riad Akrour

PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models

Preference-based reinforcement learning (PbRL) has emerged as a promising paradigm for teaching robots complex behaviors without reward engineering. However, its effectiveness is often limited by two critical challenges: the reliance on…

Robotics · Computer Science 2025-12-02 Ruiqi Wang , Dezhong Zhao , Ziqin Yuan , Tianyu Shao , Guohua Chen , Dominic Kao , Sungeun Hong , Byung-Cheol Min

PrefMMT: Modeling Human Preferences in Preference-based Reinforcement Learning with Multimodal Transformers

Preference-based reinforcement learning (PbRL) shows promise in aligning robot behaviors with human preferences, but its success depends heavily on the accurate modeling of human preferences through reward models. Most methods adopt…

Robotics · Computer Science 2025-03-12 Dezhong Zhao , Ruiqi Wang , Dayoon Suh , Taehyeon Kim , Ziqin Yuan , Byung-Cheol Min , Guohua Chen

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts,…

Machine Learning · Computer Science 2024-10-29 Jie Cheng , Gang Xiong , Xingyuan Dai , Qinghai Miao , Yisheng Lv , Fei-Yue Wang

Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Preference-based reinforcement learning (PBRL) in the offline setting has succeeded greatly in industrial applications such as chatbots. A two-step learning framework where one applies a reinforcement learning step after a reward modeling…

Machine Learning · Computer Science 2024-10-28 Yinglun Xu , Tarun Suresh , Rohan Gumaste , David Zhu , Ruirui Li , Zhengyang Wang , Haoming Jiang , Xianfeng Tang , Qingyu Yin , Monica Xiao Cheng , Qi Zeng , Chao Zhang , Gagandeep Singh

DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition

Preference-based Reinforcement Learning (PbRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human…

Robotics · Computer Science 2026-01-22 Yuki Kadokawa , Jonas Frey , Takahiro Miki , Takamitsu Matsubara , Marco Hutter

Physics-informed Dyna-Style Model-Based Deep Reinforcement Learning for Dynamic Control

Model-based reinforcement learning (MBRL) is believed to have much higher sample efficiency compared to model-free algorithms by learning a predictive model of the environment. However, the performance of MBRL highly relies on the quality…

Machine Learning · Computer Science 2022-11-16 Xin-Yang Liu , Jian-Xun Wang

Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions. This inherent difficulty curtails the broader…

Artificial Intelligence · Computer Science 2024-07-02 Zichao Shen , Tianchen Zhu , Qingyun Sun , Shiqi Gao , Jianxin Li

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems…

Machine Learning · Computer Science 2024-05-30 Fengshuo Bai , Rui Zhao , Hongming Zhang , Sijia Cui , Ying Wen , Yaodong Yang , Bo Xu , Lei Han

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

Many sequential decision-making tasks involve optimizing multiple conflicting objectives, requiring policies that adapt to different user preferences. In multi-objective reinforcement learning (MORL), one widely studied approach} addresses…

Machine Learning · Computer Science 2026-04-28 Ying-Tu Chen , Wei Hung , Bing-Shu Wu , Zhang-Wei Hong , Ping-Chun Hsieh

Data Driven Reward Initialization for Preference based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) methods utilize binary feedback from the human in the loop (HiL) over queried trajectory pairs to learn a reward model in an attempt to approximate the human's underlying reward function…

Machine Learning · Computer Science 2023-02-20 Mudit Verma , Subbarao Kambhampati