Related papers: Avoiding Side Effects in Complex Environments

On Avoiding Power-Seeking by Artificial Intelligence

We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether -- absent a full solution to this AI alignment problem -- we can build smart AI agents which have limited impact on the world,…

Artificial Intelligence · Computer Science 2022-06-24 Alexander Matt Turner

Avoiding Side Effects By Considering Future Tasks

Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the…

Machine Learning · Computer Science 2020-10-16 Victoria Krakovna , Laurent Orseau , Richard Ngo , Miljan Martic , Shane Legg

Conservative Agency via Attainable Utility Preservation

Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function can irreversibly change the state of its environment. If that change precludes…

Artificial Intelligence · Computer Science 2020-06-11 Alexander Matt Turner , Dylan Hadfield-Menell , Prasad Tadepalli

Penalizing side effects using stepwise relative reachability

How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes…

Machine Learning · Computer Science 2019-03-11 Victoria Krakovna , Laurent Orseau , Ramana Kumar , Miljan Martic , Shane Legg

Continuously evolving rewards in an open-ended environment

Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is difficult, partly because goals and associated behaviours emerge endogenously and are dynamically updated as…

Machine Learning · Computer Science 2024-05-03 Richard M. Bailey

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended…

Machine Learning · Computer Science 2021-02-24 David Lindner , Kyle Matoba , Alexander Meulemans

SafeLife 1.0: Exploring Side Effects in Complex Environments

We present SafeLife, a publicly available reinforcement learning environment that tests the safety of reinforcement learning agents. It contains complex, dynamic, tunable, procedurally generated levels with many opportunities for unsafe…

Artificial Intelligence · Computer Science 2021-03-01 Carroll L. Wainwright , Peter Eckersley

Assisted Robust Reward Design

Real-world robotic tasks require complex reward functions. When we define the problem the robot needs to solve, we pretend that a designer specifies this complex reward exactly, and it is set in stone from then on. In practice, however,…

Robotics · Computer Science 2021-11-19 Jerry Zhi-Yang He , Anca D. Dragan

Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems

Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model -- handcrafted or machine acquired -- is inevitable due to practical limitations…

Computers and Society · Computer Science 2021-10-20 Sandhya Saisubramanian , Shlomo Zilberstein , Ece Kamar

Exploiting Contextual Structure to Generate Useful Auxiliary Tasks

Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We…

Artificial Intelligence · Computer Science 2024-04-05 Benedict Quartey , Ankit Shah , George Konidaris

Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior

Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our job in the learning process is to design reward functions to express desired behavior and enable the agent to learn such…

Machine Learning · Computer Science 2024-08-08 Zhiyuan Zhou , Shreyas Sundara Raman , Henry Sowerby , Michael L. Littman

Continual Auxiliary Task Learning

Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there…

Machine Learning · Computer Science 2022-02-24 Matthew McLeod , Chunlok Lo , Matthew Schlegel , Andrew Jacobsen , Raksha Kumaraswamy , Martha White , Adam White

Pitfalls of learning a reward function online

In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We…

Artificial Intelligence · Computer Science 2020-04-29 Stuart Armstrong , Jan Leike , Laurent Orseau , Shane Legg

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they don't know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training…

Artificial Intelligence · Computer Science 2020-10-08 Dylan Hadfield-Menell , Smitha Milli , Pieter Abbeel , Stuart Russell , Anca Dragan

Behavior Alignment via Reward Function Optimization

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that…

Machine Learning · Computer Science 2023-11-01 Dhawal Gupta , Yash Chandak , Scott M. Jordan , Philip S. Thomas , Bruno Castro da Silva

Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version)

Mechanism design is a well-established game-theoretic paradigm for designing games to achieve desired outcomes. This paper addresses a closely related but distinct concept, equilibrium design. Unlike mechanism design, the designer's…

Computer Science and Game Theory · Computer Science 2024-08-20 Muhammad Najib , Giuseppe Perelli

Reinforcement Learning with Unsupervised Auxiliary Tasks

Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that…

Machine Learning · Computer Science 2016-11-17 Max Jaderberg , Volodymyr Mnih , Wojciech Marian Czarnecki , Tom Schaul , Joel Z Leibo , David Silver , Koray Kavukcuoglu

Learning in Multi-Objective Public Goods Games with Non-Linear Utilities

Addressing the question of how to achieve optimal decision-making under risk and uncertainty is crucial for enhancing the capabilities of artificial agents that collaborate with or support humans. In this work, we address this question in…

Multiagent Systems · Computer Science 2024-08-02 Nicole Orzan , Erman Acar , Davide Grossi , Patrick Mannion , Roxana Rădulescu

Learning To Explore With Predictive World Model Via Self-Supervised Learning

Autonomous artificial agents must be able to learn behaviors in complex environments without humans to design tasks and rewards. Designing these functions for each environment is not feasible, thus, motivating the development of intrinsic…

Machine Learning · Computer Science 2025-02-20 Alana Santana , Paula P. Costa , Esther L. Colombini

Action and Perception as Divergence Minimization

To learn directed behaviors in complex environments, intelligent agents need to optimize objective functions. Various objectives are known for designing artificial agents, including task rewards and intrinsic motivation. However, it is…

Artificial Intelligence · Computer Science 2022-02-15 Danijar Hafner , Pedro A. Ortega , Jimmy Ba , Thomas Parr , Karl Friston , Nicolas Heess