Related papers: A Generalized Acquisition Function for Preference-…

Preference-based Learning of Reward Function Features

Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward…

Robotics · Computer Science 2021-03-05 Sydney M. Katz , Amir Maleki , Erdem Bıyık , Mykel J. Kochenderfer

Batch Active Preference-Based Learning of Reward Functions

Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by…

Machine Learning · Computer Science 2018-10-11 Erdem Bıyık , Dorsa Sadigh

Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from…

Robotics · Computer Science 2021-08-05 Erdem Bıyık , Dylan P. Losey , Malayandi Palan , Nicholas C. Landolfi , Gleb Shevchuk , Dorsa Sadigh

Learning Preferences for Interactive Autonomy

When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward…

Robotics · Computer Science 2022-10-21 Erdem Bıyık

Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human…

Machine Learning · Computer Science 2024-05-24 Andi Peng , Yuying Sun , Tianmin Shu , David Abel

Batch Active Learning of Reward Functions from Human Preferences

Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in…

Machine Learning · Computer Science 2024-02-27 Erdem Bıyık , Nima Anari , Dorsa Sadigh

Active Preference-Based Gaussian Process Regression for Reward Learning

Designing reward functions is a challenging problem in AI and robotics. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is to learn reward functions…

Robotics · Computer Science 2020-06-05 Erdem Bıyık , Nicolas Huynh , Mykel J. Kochenderfer , Dorsa Sadigh

Active Preference Learning using Maximum Regret

We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots. In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns…

Robotics · Computer Science 2020-09-30 Nils Wilde , Dana Kulic , Stephen L. Smith

Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries

We focus on learning the desired objective function for a robot. Although trajectory demonstrations can be very informative of the desired objective, they can also be difficult for users to provide. Answers to comparison queries, asking…

Artificial Intelligence · Computer Science 2018-02-07 Chandrayee Basu , Mukesh Singhal , Anca D. Dragan

Towards Preference Learning for Autonomous Ground Robot Navigation Tasks

We are interested in the design of autonomous robot behaviors that learn the preferences of users over continued interactions, with the goal of efficiently executing navigation behaviors in a way that the user expects. In this paper, we…

Robotics · Computer Science 2020-11-06 Cory Hayes , Matthew Marge

Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. Reward function is crucial for most of…

Machine Learning · Computer Science 2021-05-06 Xiaocong Chen , Lina Yao , Xianzhi Wang , Aixin Sun , Wenjie Zhang , Quan Z. Sheng

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample…

Artificial Intelligence · Computer Science 2024-02-29 Katherine Metcalf , Miguel Sarabia , Natalie Mackraz , Barry-John Theobald

Capturing Individual Human Preferences with Reward Features

Reinforcement learning from human feedback usually models preferences using a reward function that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for…

Artificial Intelligence · Computer Science 2026-02-20 André Barreto , Vincent Dumoulin , Yiran Mao , Mark Rowland , Nicolas Perez-Nieves , Bobak Shahriari , Yann Dauphin , Doina Precup , Hugo Larochelle

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

For a real-world decision-making problem, the reward function often needs to be engineered or learned. A popular approach is to utilize human feedback to learn a reward function for training. The most straightforward way to do so is to ask…

Machine Learning · Computer Science 2023-10-31 Xiang Ji , Huazheng Wang , Minshuo Chen , Tuo Zhao , Mengdi Wang

Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

Robots can learn the right reward function by querying a human expert. Existing approaches attempt to choose questions where the robot is most uncertain about the human's response; however, they do not consider how easy it will be for the…

Robotics · Computer Science 2019-10-11 Erdem Bıyık , Malayandi Palan , Nicholas C. Landolfi , Dylan P. Losey , Dorsa Sadigh

Reward-rational (implicit) choice: A unifying formalism for reward learning

It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward…

Machine Learning · Computer Science 2020-12-14 Hong Jun Jeon , Smitha Milli , Anca D. Dragan

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which…

Robotics · Computer Science 2019-06-24 Malayandi Palan , Nicholas C. Landolfi , Gleb Shevchuk , Dorsa Sadigh

Learning from Preferences and Mixed Demonstrations in General Settings

Reinforcement learning is a general method for learning in sequential settings, but it can often be difficult to specify a good reward function when the task is complex. In these cases, preference feedback or expert demonstrations can be…

Machine Learning · Computer Science 2025-08-20 Jason R Brown , Carl Henrik Ek , Robert D Mullins

Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work…

Machine Learning · Computer Science 2023-03-21 Jeremy Tien , Jerry Zhi-Yang He , Zackory Erickson , Anca D. Dragan , Daniel S. Brown

Active Reward Learning from Online Preferences

Robot policies need to adapt to human preferences and/or new environments. Human experts may have the domain knowledge required to help robots achieve this adaptation. However, existing works often require costly offline re-training on…

Machine Learning · Computer Science 2023-02-28 Vivek Myers , Erdem Bıyık , Dorsa Sadigh