Related papers: Quantifying Differences in Reward Functions

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world. However, comparing reward functions, for example as a means of evaluating reward learning methods, presents a…

Machine Learning · Computer Science 2022-01-26 Blake Wulfe , Ashwin Balakrishna , Logan Ellis , Jean Mercat , Rowan McAllister , Adrien Gaidon

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

It is often very challenging to manually design reward functions for complex, real-world tasks. To solve this, one can instead use reward learning to infer a reward function from data. However, there are often multiple reward functions that…

Machine Learning · Computer Science 2023-06-08 Joar Skalse , Matthew Farrugia-Roberts , Stuart Russell , Alessandro Abate , Adam Gleave

STARC: A General Framework For Quantifying Differences Between Reward Functions

In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never…

Machine Learning · Computer Science 2024-12-13 Joar Skalse , Lucy Farnik , Sumeet Ramesh Motwani , Erik Jenner , Adam Gleave , Alessandro Abate

Learning Preferences for Interactive Autonomy

When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward…

Robotics · Computer Science 2022-10-21 Erdem Bıyık

Environment Probing Interaction Policies

A key challenge in reinforcement learning (RL) is environment generalization: a policy trained to solve a task in one environment often fails to solve the same task in a slightly different test environment. A common approach to improve…

Robotics · Computer Science 2019-07-30 Wenxuan Zhou , Lerrel Pinto , Abhinav Gupta

Efficient Language-instructed Skill Acquisition via Reward-Policy Co-Evolution

The ability to autonomously explore and resolve tasks with minimal human guidance is crucial for the self-development of embodied intelligence. Although reinforcement learning methods can largely ease human effort, it's challenging to…

Robotics · Computer Science 2024-12-19 Changxin Huang , Yanbin Chang , Junfan Lin , Junyang Liang , Runhao Zeng , Jianqiang Li

Reward-Conditioned Policies

Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle,…

Machine Learning · Computer Science 2020-01-01 Aviral Kumar , Xue Bin Peng , Sergey Levine

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance…

Artificial Intelligence · Computer Science 2018-06-26 Daniel S. Brown , Scott Niekum

Adversarial Imitation via Variational Inverse Reinforcement Learning

We consider a problem of learning the reward and policy from expert examples under unknown dynamics. Our proposed method builds on the framework of generative adversarial networks and introduces the empowerment-regularized maximum-entropy…

Machine Learning · Computer Science 2019-02-26 Ahmed H. Qureshi , Byron Boots , Michael C. Yip

Policy Learning for Balancing Short-Term and Long-Term Rewards

Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may…

Machine Learning · Computer Science 2024-09-17 Peng Wu , Ziyu Shen , Feng Xie , Zhongyao Wang , Chunchen Liu , Yan Zeng

Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting…

Artificial Intelligence · Computer Science 2021-12-20 Jasmina Gajcin , Rahul Nair , Tejaswini Pedapati , Radu Marinescu , Elizabeth Daly , Ivana Dusparic

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed…

Machine Learning · Computer Science 2019-06-10 Ruohan Wang , Carlo Ciliberto , Pierluigi Amadori , Yiannis Demiris

Projection-Based Constrained Policy Optimization

We consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm, Projection-Based Constrained Policy…

Machine Learning · Computer Science 2020-10-08 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Reinforcement Learning Policies in Continuous-Time Linear Systems

Linear dynamical systems that obey stochastic differential equations are canonical models. While optimal control of known systems has a rich literature, the problem is technically hard under model uncertainty and there are hardly any…

Systems and Control · Electrical Eng. & Systems 2023-06-09 Mohamad Kazem Shirani Faradonbeh , Mohamad Sadegh Shirani Faradonbeh

Informativeness of Reward Functions in Reinforcement Learning

Reward functions are central in specifying the task we want a reinforcement learning agent to perform. Given a task and desired optimal behavior, we study the problem of designing informative reward functions so that the designed rewards…

Machine Learning · Computer Science 2024-02-13 Rati Devidze , Parameswaran Kamalaruban , Adish Singla

Capturing Individual Human Preferences with Reward Features

Reinforcement learning from human feedback usually models preferences using a reward function that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for…

Artificial Intelligence · Computer Science 2026-02-20 André Barreto , Vincent Dumoulin , Yiran Mao , Mark Rowland , Nicolas Perez-Nieves , Bobak Shahriari , Yann Dauphin , Doina Precup , Hugo Larochelle

Towards Learning Reward Functions from User Interactions

In the physical world, people have dynamic preferences, e.g., the same situation can lead to satisfaction for some humans and to frustration for others. Personalization is called for. The same observation holds for online behavior with…

Information Retrieval · Computer Science 2017-08-16 Ziming Li , Julia Kiseleva , Maarten de Rijke , Artem Grotov

Experiment Planning with Function Approximation

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data…

Machine Learning · Computer Science 2024-01-11 Aldo Pacchiano , Jonathan N. Lee , Emma Brunskill

On The Fragility of Learned Reward Functions

Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer reward functions from human feedback and preferences. Prior works on reward learning have mainly…

Machine Learning · Computer Science 2023-01-11 Lev McKinney , Yawen Duan , David Krueger , Adam Gleave

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the…

Artificial Intelligence · Computer Science 2020-08-19 Umer Siddique , Paul Weng , Matthieu Zimmer