English

Evaluating Agents without Rewards

Machine Learning 2021-02-11 v2 Artificial Intelligence Robotics

Abstract

Reinforcement learning has enabled agents to solve challenging tasks in unknown environments. However, manually crafting reward functions can be time consuming, expensive, and error prone to human error. Competing objectives have been proposed for agents to learn without external supervision, but it has been unclear how well they reflect task rewards or human behavior. To accelerate the development of intrinsic objectives, we retrospectively compute potential objectives on pre-collected datasets of agent behavior, rather than optimizing them online, and compare them by analyzing their correlations. We study input entropy, information gain, and empowerment across seven agents, three Atari games, and the 3D game Minecraft. We find that all three intrinsic objectives correlate more strongly with a human behavior similarity metric than with task reward. Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.

Keywords

Cite

@article{arxiv.2012.11538,
  title  = {Evaluating Agents without Rewards},
  author = {Brendon Matusch and Jimmy Ba and Danijar Hafner},
  journal= {arXiv preprint arXiv:2012.11538},
  year   = {2021}
}

Comments

15 pages, 6 figures, 5 tables

R2 v1 2026-06-23T21:09:13.102Z