PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Toygun Basaklar; Suat Gumussoy; Umit Y. Ogras

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Machine Learning 2023-05-31 v3 Artificial Intelligence

Authors: Toygun Basaklar , Suat Gumussoy , Umit Y. Ogras

Abstract

Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find fixed customized policies corresponding to preference vectors specified during training. However, the design constraints and objectives typically change dynamically in real-life scenarios. Furthermore, storing a policy for each potential preference is not scalable. Hence, obtaining a set of Pareto front solutions for the entire preference space in a given domain with a single training is critical. To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters. It also employs a novel parallelization approach to increase sample efficiency. We show that PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.

Keywords

multi-view learning multi-objective optimization reinforcement learning

Cite

@article{arxiv.2208.07914,
  title  = {PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm},
  author = {Toygun Basaklar and Suat Gumussoy and Umit Y. Ogras},
  journal= {arXiv preprint arXiv:2208.07914},
  year   = {2023}
}

Comments

24 pages, 8 Figures, 9 Tables, Published as a conference paper at ICLR 2023, https://openreview.net/forum?id=zS9sRyaPFlJ

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Abstract

Keywords

Cite

Comments

Related papers