Related papers: Preprocessing Reward Functions for Interpretabilit…

Understanding Learned Reward Functions

In many real-world tasks, it is not possible to procedurally specify an RL agent's reward function. In such cases, a reward function must instead be learned from interacting with and observing humans. However, current techniques for reward…

Machine Learning · Computer Science 2020-12-11 Eric J. Michaud , Adam Gleave , Stuart Russell

Unsupervised Perceptual Rewards for Imitation Learning

Reward function design and exploration time are arguably the biggest obstacles to the deployment of reinforcement learning (RL) agents in the real world. In many real-world tasks, designing a reward function takes considerable hand…

Computer Vision and Pattern Recognition · Computer Science 2017-06-14 Pierre Sermanet , Kelvin Xu , Sergey Levine

Reward-rational (implicit) choice: A unifying formalism for reward learning

It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward…

Machine Learning · Computer Science 2020-12-14 Hong Jun Jeon , Smitha Milli , Anca D. Dragan

On The Fragility of Learned Reward Functions

Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer reward functions from human feedback and preferences. Prior works on reward learning have mainly…

Machine Learning · Computer Science 2023-01-11 Lev McKinney , Yawen Duan , David Krueger , Adam Gleave

Perceptual Reward Functions

Reinforcement learning problems are often described through rewards that indicate if an agent has completed some task. This specification can yield desirable behavior, however many problems are difficult to specify in this manner, as one…

Artificial Intelligence · Computer Science 2016-08-15 Ashley Edwards , Charles Isbell , Atsuo Takanishi

Reward Learning with Trees: Methods and Evaluation

Recent efforts to learn reward functions from human feedback have tended to use deep neural networks, whose lack of transparency hampers our ability to explain agent behaviour or verify alignment. We explore the merits of learning…

Machine Learning · Computer Science 2022-10-04 Tom Bewley , Jonathan Lawry , Arthur Richards , Rachel Craddock , Ian Henderson

Reward Model Interpretability via Optimal and Pessimal Tokens

Reward modeling has emerged as a crucial component in aligning large language models with human values. Significant attention has focused on using reward models as a means for fine-tuning generative models. However, the reward models…

Computation and Language · Computer Science 2026-02-04 Brian Christian , Hannah Rose Kirk , Jessica A. F. Thompson , Christopher Summerfield , Tsvetomira Dumbalska

Explaining Reward Functions to Humans for Better Human-Robot Collaboration

Explainable AI techniques that describe agent reward functions can enhance human-robot collaboration in a variety of settings. One context where human understanding of agent reward functions is particularly beneficial is in the value…

Robotics · Computer Science 2021-10-11 Lindsay Sanneman , Julie Shah

Learning Preferences for Interactive Autonomy

When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward…

Robotics · Computer Science 2022-10-21 Erdem Bıyık

Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from…

Robotics · Computer Science 2021-08-05 Erdem Bıyık , Dylan P. Losey , Malayandi Palan , Nicholas C. Landolfi , Gleb Shevchuk , Dorsa Sadigh

Informativeness of Reward Functions in Reinforcement Learning

Reward functions are central in specifying the task we want a reinforcement learning agent to perform. Given a task and desired optimal behavior, we study the problem of designing informative reward functions so that the designed rewards…

Machine Learning · Computer Science 2024-02-13 Rati Devidze , Parameswaran Kamalaruban , Adish Singla

Preference-based Learning of Reward Function Features

Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward…

Robotics · Computer Science 2021-03-05 Sydney M. Katz , Amir Maleki , Erdem Bıyık , Mykel J. Kochenderfer

Towards Learning Reward Functions from User Interactions

In the physical world, people have dynamic preferences, e.g., the same situation can lead to satisfaction for some humans and to frustration for others. Personalization is called for. The same observation holds for online behavior with…

Information Retrieval · Computer Science 2017-08-16 Ziming Li , Julia Kiseleva , Maarten de Rijke , Artem Grotov

Programmatic Reward Design by Example

Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of programmatic reward design,…

Machine Learning · Computer Science 2022-01-10 Weichao Zhou , Wenchao Li

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction…

Machine Learning · Computer Science 2022-07-18 Xuhong Li , Haoyi Xiong , Xingjian Li , Xuanyu Wu , Xiao Zhang , Ji Liu , Jiang Bian , Dejing Dou

The Promise and Peril of Human Evaluation for Model Interpretability

Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications.…

Artificial Intelligence · Computer Science 2019-10-31 Bernease Herman

An Evaluation of the Human-Interpretability of Explanation

Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains…

Machine Learning · Computer Science 2019-08-30 Isaac Lage , Emily Chen , Jeffrey He , Menaka Narayanan , Been Kim , Sam Gershman , Finale Doshi-Velez

Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human…

Machine Learning · Computer Science 2024-05-24 Andi Peng , Yuying Sun , Tianmin Shu , David Abel

A Survey on Interpretable Reinforcement Learning

Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such…

Machine Learning · Computer Science 2022-02-25 Claire Glanois , Paul Weng , Matthieu Zimmer , Dong Li , Tianpei Yang , Jianye Hao , Wulong Liu

Techniques for Interpretable Machine Learning

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a…

Machine Learning · Computer Science 2019-05-21 Mengnan Du , Ninghao Liu , Xia Hu