English
Related papers

Related papers: RL agents Implicitly Learning Human Preferences

200 papers

For AI systems to be useful to humans, they must understand and act in accordance with our values and preferences. Since specifying preferences is a hard task, inverse reinforcement learning (IRL) aims to develop methods that allow for…

Artificial Intelligence · Computer Science 2026-05-12 Karim Abdel Sadek , Mark Bedaywi , Rhys Gould , Stuart Russell

Reinforcement learning (RL) agents optimize only the features specified in a reward function and are indifferent to anything left out inadvertently. This means that we must not only specify what to do, but also the much larger space of what…

Machine Learning · Computer Science 2019-04-22 Rohin Shah , Dmitrii Krasheninnikov , Jordan Alexander , Pieter Abbeel , Anca Dragan

The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined…

Machine Learning · Computer Science 2020-10-21 Jiachen Yang , Ang Li , Mehrdad Farajtabar , Peter Sunehag , Edward Hughes , Hongyuan Zha

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2022-02-18 Daniel Shin , Daniel S. Brown , Anca D. Dragan

The aim of inverse reinforcement learning (IRL) is to infer an agent's preferences from observing their behaviour. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central…

Machine Learning · Computer Science 2024-12-17 Joar Skalse , Alessandro Abate

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new…

Machine Learning · Computer Science 2024-05-09 Wanqi Xue , Bo An , Shuicheng Yan , Zhongwen Xu

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human…

Machine Learning · Statistics 2023-02-20 Paul Christiano , Jan Leike , Tom B. Brown , Miljan Martic , Shane Legg , Dario Amodei

In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting…

Artificial Intelligence · Computer Science 2021-12-20 Jasmina Gajcin , Rahul Nair , Tejaswini Pedapati , Radu Marinescu , Elizabeth Daly , Ivana Dusparic

This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both…

Machine Learning · Computer Science 2012-08-07 Riad Akrour , Marc Schoenauer , Michèle Sebag

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the…

Machine Learning · Computer Science 2023-01-05 Daniel Shin , Anca D. Dragan , Daniel S. Brown

Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown…

Computation and Language · Computer Science 2025-02-27 Hao Peng , Yunjia Qi , Xiaozhi Wang , Zijun Yao , Bin Xu , Lei Hou , Juanzi Li

Inverse reinforcement learning (IRL) is a common technique for inferring human preferences from data. Standard IRL techniques tend to assume that the human demonstrator is stationary, that is that their policy $\pi$ doesn't change over…

Machine Learning · Computer Science 2020-12-02 Harry Giles , Lawrence Chan

Preference-based reinforcement learning (RL) provides a framework to train agents using human preferences between two behaviors. However, preference-based RL has been challenging to scale since it requires a large amount of human feedback…

Machine Learning · Computer Science 2023-03-03 Changyeon Kim , Jongjin Park , Jinwoo Shin , Honglak Lee , Pieter Abbeel , Kimin Lee

Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify what that is. Inverse Reinforcement Learning (IRL) enables us to infer reward functions from demonstrations, but it usually assumes…

Machine Learning · Computer Science 2019-06-25 Rohin Shah , Noah Gundotra , Pieter Abbeel , Anca D. Dragan

Reinforcement learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen…

Artificial Intelligence · Computer Science 2025-04-16 Amal Alabdulkarim , Madhuri Singh , Gennie Mansi , Kaely Hall , Upol Ehsan , Mark O. Riedl

Recent work in AI safety has highlighted that in sequential decision making, objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize the stated objective in ways that may result in undesirable…

Artificial Intelligence · Computer Science 2021-06-07 Parand Alizadeh Alamdari , Toryn Q. Klassen , Rodrigo Toro Icarte , Sheila A. McIlraith

Reinforcement Learning (RL) is known to be often unsuccessful in environments with sparse extrinsic rewards. A possible countermeasure is to endow RL agents with an intrinsic reward function, or 'intrinsic motivation', which rewards the…

Artificial Intelligence · Computer Science 2021-07-16 Francesco Massari , Martin Biehl , Lisa Meeden , Ryota Kanai

Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like…

Artificial Intelligence · Computer Science 2023-01-31 Alexander Matt Turner , Logan Smith , Rohin Shah , Andrew Critch , Prasad Tadepalli

Preference-based reinforcement learning (RL) provides a framework to train AI agents using human feedback through preferences over pairs of behaviors, enabling agents to learn desired behaviors when it is difficult to specify a numerical…

Human-Computer Interaction · Computer Science 2025-03-21 David Chhan , Ellen Novoseller , Vernon J. Lawhern

In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards. Usually, rewards are observed only after acting, and so the goal is to maximize the expected cumulative…

Machine Learning · Computer Science 2024-10-15 Nadav Merlis , Dorian Baudry , Vianney Perchet
‹ Prev 1 2 3 10 Next ›