English

Large Language Model Selection with Limited Annotations

Computation and Language 2026-05-26 v1 Machine Learning

Abstract

Choosing a Large Language Model (LLM) for a given task requires comparing many strong candidates, yet standard evaluation relies on costly annotations over fixed evaluation sets. To address this challenge, we develop SELECT-LLM, the first framework for active model selection of LLMs. SELECT-LLM aims to find a small set of queries whose annotations are most informative for identifying the best LLM for a given task. To this end, we introduce a query selection rule based on expected information gain, computed from pairwise similarities between candidate model outputs. Because this rule only uses generated model responses, SELECT-LLM can be applied across candidate models without assumptions about their architecture or access to model weights. This makes it suitable for both open-weight and black-box LLMs. We evaluate SELECT-LLM across 23 datasets, 156 evaluated models, diverse task families, and multiple text evaluation metrics. Across all experiments, SELECT-LLM improves over the strongest baseline in every setting, with annotation cost reductions up to 81.8% for best model selection and up to 84.78% for near-best model selection.

Keywords

Cite

@article{arxiv.2605.24981,
  title  = {Large Language Model Selection with Limited Annotations},
  author = {Yavuz Durmazkeser and Patrik Okanovic and Andreas Kirsch and Torsten Hoefler and Nezihe Merve Gürel},
  journal= {arXiv preprint arXiv:2605.24981},
  year   = {2026}
}

Comments

33 pages, 5 figures, 4 tables