Efficient Exploration for LLMs

Vikranth Dwaracherla; Seyed Mohammad Asghari; Botao Hao; Benjamin Van Roy

Efficient Exploration for LLMs

Machine Learning 2024-06-06 v2 Artificial Intelligence Computation and Language Methodology Machine Learning

Authors: Vikranth Dwaracherla , Seyed Mohammad Asghari , Botao Hao , Benjamin Van Roy

Abstract

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

Keywords

multi-agent systems information retrieval language modeling

Cite

@article{arxiv.2402.00396,
  title  = {Efficient Exploration for LLMs},
  author = {Vikranth Dwaracherla and Seyed Mohammad Asghari and Botao Hao and Benjamin Van Roy},
  journal= {arXiv preprint arXiv:2402.00396},
  year   = {2024}
}

Comments

Accepted at ICML 2024

Efficient Exploration for LLMs

Abstract

Keywords

Cite

Comments

Related papers