Rating-based Reinforcement Learning

Devin White; Mingkang Wu; Ellen Novoseller; Vernon J. Lawhern; Nicholas Waytowich; Yongcan Cao

Rating-based Reinforcement Learning

Machine Learning 2024-01-30 v2 Artificial Intelligence Robotics

Authors: Devin White , Mingkang Wu , Ellen Novoseller , Vernon J. Lawhern , Nicholas Waytowich , Yongcan Cao

Abstract

This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.

Keywords

reinforcement learning from human feedback reinforcement learning machine learning theory

Cite

@article{arxiv.2307.16348,
  title  = {Rating-based Reinforcement Learning},
  author = {Devin White and Mingkang Wu and Ellen Novoseller and Vernon J. Lawhern and Nicholas Waytowich and Yongcan Cao},
  journal= {arXiv preprint arXiv:2307.16348},
  year   = {2024}
}

Comments

This is an extended version of the paper "Rating-based Reinforcement Learning" accepted to the 38th Annual AAAI Conference on Artificial Intelligence

Rating-based Reinforcement Learning

Abstract

Keywords

Cite

Comments

Related papers