Related papers: RL agents Implicitly Learning Human Preferences

Learning the Preferences of a Learning Agent

For AI systems to be useful to humans, they must understand and act in accordance with our values and preferences. Since specifying preferences is a hard task, inverse reinforcement learning (IRL) aims to develop methods that allow for…

Artificial Intelligence · Computer Science 2026-05-12 Karim Abdel Sadek , Mark Bedaywi , Rhys Gould , Stuart Russell

Preferences Implicit in the State of the World

Reinforcement learning (RL) agents optimize only the features specified in a reward function and are indifferent to anything left out inadvertently. This means that we must not only specify what to do, but also the much larger space of what…

Machine Learning · Computer Science 2019-04-22 Rohin Shah , Dmitrii Krasheninnikov , Jordan Alexander , Pieter Abbeel , Anca Dragan

Learning to Incentivize Other Learning Agents

The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined…

Machine Learning · Computer Science 2020-10-21 Jiachen Yang , Ang Li , Mehrdad Farajtabar , Peter Sunehag , Edward Hughes , Hongyuan Zha

Offline Preference-Based Apprenticeship Learning

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2022-02-18 Daniel Shin , Daniel S. Brown , Anca D. Dragan

Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting

The aim of inverse reinforcement learning (IRL) is to infer an agent's preferences from observing their behaviour. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central…

Machine Learning · Computer Science 2024-12-17 Joar Skalse , Alessandro Abate

Reinforcement Learning from Diverse Human Preferences

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new…

Machine Learning · Computer Science 2024-05-09 Wanqi Xue , Bo An , Shuicheng Yan , Zhongwen Xu

Deep reinforcement learning from human preferences

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human…

Machine Learning · Statistics 2023-02-20 Paul Christiano , Jan Leike , Tom B. Brown , Miljan Martic , Shane Legg , Dario Amodei

Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting…

Artificial Intelligence · Computer Science 2021-12-20 Jasmina Gajcin , Rahul Nair , Tejaswini Pedapati , Radu Marinescu , Elizabeth Daly , Ivana Dusparic

APRIL: Active Preference-learning based Reinforcement Learning

This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both…

Machine Learning · Computer Science 2012-08-07 Riad Akrour , Marc Schoenauer , Michèle Sebag

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2023-01-05 Daniel Shin , Anca D. Dragan , Daniel S. Brown

Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown…

Computation and Language · Computer Science 2025-02-27 Hao Peng , Yunjia Qi , Xiaozhi Wang , Zijun Yao , Bin Xu , Lei Hou , Juanzi Li

Accounting for Human Learning when Inferring Human Preferences

Inverse reinforcement learning (IRL) is a common technique for inferring human preferences from data. Standard IRL techniques tend to assume that the human demonstrator is stationary, that is that their policy $\pi$ doesn't change over…

Machine Learning · Computer Science 2020-12-02 Harry Giles , Lawrence Chan

Preference Transformer: Modeling Human Preferences using Transformers for RL

Preference-based reinforcement learning (RL) provides a framework to train agents using human preferences between two behaviors. However, preference-based RL has been challenging to scale since it requires a large amount of human feedback…

Machine Learning · Computer Science 2023-03-03 Changyeon Kim , Jongjin Park , Jinwoo Shin , Honglak Lee , Pieter Abbeel , Kimin Lee

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify what that is. Inverse Reinforcement Learning (IRL) enables us to infer reward functions from demonstrations, but it usually assumes…

Machine Learning · Computer Science 2019-06-25 Rohin Shah , Noah Gundotra , Pieter Abbeel , Anca D. Dragan

Experiential Explanations for Reinforcement Learning

Reinforcement learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen…

Artificial Intelligence · Computer Science 2025-04-16 Amal Alabdulkarim , Madhuri Singh , Gennie Mansi , Kaely Hall , Upol Ehsan , Mark O. Riedl

Be Considerate: Objectives, Side Effects, and Deciding How to Act

Recent work in AI safety has highlighted that in sequential decision making, objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize the stated objective in ways that may result in undesirable…

Artificial Intelligence · Computer Science 2021-06-07 Parand Alizadeh Alamdari , Toryn Q. Klassen , Rodrigo Toro Icarte , Sheila A. McIlraith

Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments

Reinforcement Learning (RL) is known to be often unsuccessful in environments with sparse extrinsic rewards. A possible countermeasure is to endow RL agents with an intrinsic reward function, or 'intrinsic motivation', which rewards the…

Artificial Intelligence · Computer Science 2021-07-16 Francesco Massari , Martin Biehl , Lisa Meeden , Ryota Kanai

Optimal Policies Tend to Seek Power

Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like…

Artificial Intelligence · Computer Science 2023-01-31 Alexander Matt Turner , Logan Smith , Rohin Shah , Andrew Critch , Prasad Tadepalli

Crowd-PrefRL: Preference-Based Reward Learning from Crowds

Preference-based reinforcement learning (RL) provides a framework to train AI agents using human feedback through preferences over pairs of behaviors, enabling agents to learn desired behaviors when it is difficult to specify a numerical…

Human-Computer Interaction · Computer Science 2025-03-21 David Chhan , Ellen Novoseller , Vernon J. Lawhern

The Value of Reward Lookahead in Reinforcement Learning

In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards. Usually, rewards are observed only after acting, and so the goal is to maximize the expected cumulative…

Machine Learning · Computer Science 2024-10-15 Nadav Merlis , Dorian Baudry , Vianney Perchet