English

Batch Bayesian Active Learning with Partial Batch Label Sampling

Machine Learning 2026-05-12 v3 Artificial Intelligence Machine Learning

Abstract

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian-based active learning offers principled objectives with explainable intuition, including Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and Bayesian Active Learning by Disagreements (BALD). A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-BB selection). Here, using a particular formulation of Bayesian Decision Theory, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on embeddings from large pre-trained models. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.

Keywords

Cite

@article{arxiv.2510.09877,
  title  = {Batch Bayesian Active Learning with Partial Batch Label Sampling},
  author = {Kangping Hu and Stephen Mussmann},
  journal= {arXiv preprint arXiv:2510.09877},
  year   = {2026}
}