English

Parameter Efficient Reinforcement Learning from Human Feedback

Machine Learning 2024-09-16 v2 Artificial Intelligence Computation and Language

Abstract

While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models (LLMs, and VLMs) with human preferences, its computational cost and complexity hamper its wider adoption. To alleviate some of the computational burden of fine-tuning, parameter efficient methods, like LoRA were introduced. In this work, we empirically evaluate the setup of Parameter Efficient Reinforcement Learning from Human Feedback (PE-RLHF) that leverages LoRA fine-tuning for Reward Modeling, and Reinforcement Learning. We benchmark the PE-RLHF setup on six diverse datasets spanning summarization, harmless/helpful response generation, UI automation, and visual question answering in terms of effectiveness of the trained models, and the training resources required. Our findings show, for the first time, that PE-RLHF achieves comparable performance to RLHF, while significantly reducing training time (up to 90% faster for reward models, and 30% faster for RL), and memory footprint (up to 50% reduction for reward models, and 27% for RL). We provide comprehensive ablations across LoRA ranks, and model sizes for both reward modeling and reinforcement learning. By mitigating the computational burden associated with RLHF, we push for a broader adoption of PE-RLHF as an alignment technique for LLMs and VLMs.

Keywords

Cite

@article{arxiv.2403.10704,
  title  = {Parameter Efficient Reinforcement Learning from Human Feedback},
  author = {Hakim Sidahmed and Samrat Phatale and Alex Hutcheson and Zhuonan Lin and Zhang Chen and Zac Yu and Jarvis Jin and Simral Chaudhary and Roman Komarytsia and Christiane Ahlheim and Yonghao Zhu and Bowen Li and Saravanan Ganesh and Bill Byrne and Jessica Hoffmann and Hassan Mansoor and Wei Li and Abhinav Rastogi and Lucas Dixon},
  journal= {arXiv preprint arXiv:2403.10704},
  year   = {2024}
}
R2 v1 2026-06-28T15:22:26.564Z