Artificial Intelligence · Computer Science
Learning the Preferences of a Learning Agent
Karim Abdel Sadek, Mark Bedaywi, Rhys Gould, Stuart Russell
2026-05-12
Machine Learning · Computer Science
Preferences Implicit in the State of the World
Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel +1
2019-04-22
Machine Learning · Computer Science
Learning to Incentivize Other Learning Agents
Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag +2
2020-10-21
Machine Learning · Computer Science
Reinforcement Learning from Diverse Human Preferences
Wanqi Xue, Bo An, Shuicheng Yan, Zhongwen Xu
2024-05-09
Machine Learning · Statistics
Deep reinforcement learning from human preferences
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic +2
2023-02-20
Artificial Intelligence · Computer Science
Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents
Jasmina Gajcin, Rahul Nair, Tejaswini Pedapati, Radu Marinescu +2
2021-12-20
Computation and Language · Computer Science
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao +3
2025-02-27
Machine Learning · Computer Science
Preference Transformer: Modeling Human Preferences using Transformers for RL
Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee +2
2023-03-03
Machine Learning · Computer Science
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference
Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan
2019-06-25
Artificial Intelligence · Computer Science
Experiential Explanations for Reinforcement Learning
Amal Alabdulkarim, Madhuri Singh, Gennie Mansi, Kaely Hall +2
2025-04-16
Artificial Intelligence · Computer Science
Be Considerate: Objectives, Side Effects, and Deciding How to Act
Parand Alizadeh Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith
2021-06-07
Artificial Intelligence · Computer Science
Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments
Francesco Massari, Martin Biehl, Lisa Meeden, Ryota Kanai
2021-07-16
Artificial Intelligence · Computer Science
Optimal Policies Tend to Seek Power
Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch +1
2023-01-31
Artificial Intelligence · Computer Science
Learning to Understand Goal Specifications by Modelling Reward
Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes +3
2019-12-24
Machine Learning · Computer Science
On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness
Haotian Ye, Xiaoyu Chen, Liwei Wang, Simon S. Du
2023-06-30