Related papers: Avoiding Side Effects By Considering Future Tasks

Penalizing side effects using stepwise relative reachability

How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes…

Machine Learning · Computer Science 2019-03-11 Victoria Krakovna , Laurent Orseau , Ramana Kumar , Miljan Martic , Shane Legg

Avoiding Side Effects in Complex Environments

Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided…

Artificial Intelligence · Computer Science 2020-10-23 Alexander Matt Turner , Neale Ratzlaff , Prasad Tadepalli

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended…

Machine Learning · Computer Science 2021-02-24 David Lindner , Kyle Matoba , Alexander Meulemans

Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior

Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our job in the learning process is to design reward functions to express desired behavior and enable the agent to learn such…

Machine Learning · Computer Science 2024-08-08 Zhiyuan Zhou , Shreyas Sundara Raman , Henry Sowerby , Michael L. Littman

Assisted Robust Reward Design

Real-world robotic tasks require complex reward functions. When we define the problem the robot needs to solve, we pretend that a designer specifies this complex reward exactly, and it is set in stone from then on. In practice, however,…

Robotics · Computer Science 2021-11-19 Jerry Zhi-Yang He , Anca D. Dragan

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they don't know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training…

Artificial Intelligence · Computer Science 2020-10-08 Dylan Hadfield-Menell , Smitha Milli , Pieter Abbeel , Stuart Russell , Anca Dragan

Continual Auxiliary Task Learning

Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there…

Machine Learning · Computer Science 2022-02-24 Matthew McLeod , Chunlok Lo , Matthew Schlegel , Andrew Jacobsen , Raksha Kumaraswamy , Martha White , Adam White

Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards

Reward design is a critical part of the application of reinforcement learning, the performance of which strongly depends on how well the reward signal frames the goal of the designer and how well the signal assesses progress in reaching…

Machine Learning · Computer Science 2022-08-01 Yixiang Wang , Yujing Hu , Feng Wu , Yingfeng Chen

Recognising Affordances in Predicted Futures to Plan with Consideration of Non-canonical Affordance Effects

We propose a novel system for action sequence planning based on a combination of affordance recognition and a neural forward model predicting the effects of affordance execution. By performing affordance recognition on predicted futures, we…

Robotics · Computer Science 2022-06-23 Solvi Arnold , Mami Kuroishi , Tadashi Adachi , Kimitoshi Yamazaki

Behavior Alignment via Reward Function Optimization

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that…

Machine Learning · Computer Science 2023-11-01 Dhawal Gupta , Yash Chandak , Scott M. Jordan , Philip S. Thomas , Bruno Castro da Silva

Admissible Policy Teaching through Reward Design

We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function cost-efficiently while…

Machine Learning · Computer Science 2022-01-07 Kiarash Banihashem , Adish Singla , Jiarui Gan , Goran Radanovic

Pitfalls of learning a reward function online

In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We…

Artificial Intelligence · Computer Science 2020-04-29 Stuart Armstrong , Jan Leike , Laurent Orseau , Shane Legg

Avoiding Death through Fear Intrinsic Conditioning

Biological and psychological concepts have inspired reinforcement learning algorithms to create new complex behaviors that expand agents' capacity. These behaviors can be seen in the rise of techniques like goal decomposition, curriculum,…

Artificial Intelligence · Computer Science 2025-06-09 Rodney Sanchez , Ferat Sahin , Alexander Ororbia , Jamison Heard

Zero-Shot Assistance in Sequential Decision Problems

We consider the problem of creating assistants that can help agents solve new sequential decision problems, assuming the agent is not able to specify the reward function explicitly to the assistant. Instead of acting in place of the agent…

Machine Learning · Computer Science 2022-12-01 Sebastiaan De Peuter , Samuel Kaski

Conservative Agency via Attainable Utility Preservation

Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function can irreversibly change the state of its environment. If that change precludes…

Artificial Intelligence · Computer Science 2020-06-11 Alexander Matt Turner , Dylan Hadfield-Menell , Prasad Tadepalli

Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems

Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model -- handcrafted or machine acquired -- is inevitable due to practical limitations…

Computers and Society · Computer Science 2021-10-20 Sandhya Saisubramanian , Shlomo Zilberstein , Ece Kamar

Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence

To regulate a social system comprised of self-interested agents, economic incentives are often required to induce a desirable outcome. This incentive design problem naturally possesses a bilevel structure, in which a designer modifies the…

Computer Science and Game Theory · Computer Science 2022-10-14 Boyi Liu , Jiayang Li , Zhuoran Yang , Hoi-To Wai , Mingyi Hong , Yu Marco Nie , Zhaoran Wang

Design of Reward Function on Reinforcement Learning for Automated Driving

This paper proposes a design scheme of reward function that constantly evaluates both driving states and actions for applying reinforcement learning to automated driving. In the field of reinforcement learning, reward functions often…

Robotics · Computer Science 2025-03-24 Takeru Goto , Yuki Kizumi , Shun Iwasaki

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question…

Artificial Intelligence · Computer Science 2021-03-29 Tom Everitt , Marcus Hutter , Ramana Kumar , Victoria Krakovna

Active Inverse Reward Design

Designers of AI agents often iterate on the reward function in a trial-and-error process until they get the desired behavior, but this only guarantees good behavior in the training environment. We propose structuring this process as a…

Machine Learning · Computer Science 2023-10-17 Sören Mindermann , Rohin Shah , Adam Gleave , Dylan Hadfield-Menell