LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

Darren Zhu; Daren Ler

LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

Machine Learning 2026-05-12 v1

Authors: Darren Zhu , Daren Ler

Abstract

Meta-learning for algorithm selection relies on a meta-dataset in which each row corresponds to a supervised learning dataset described by meta-features and labelled with a target value that is associated with algorithm choice (typically, some function of algorithm performance). A persistent limitation is that the number of curated real-world datasets is small, resulting in sparse meta-datasets that constrain meta-learner generalisation. In this paper, we address this problem by augmenting the meta-dataset with synthetic regression datasets produced via a large language model (LLM), with generation steered toward target regions of a low-dimensionality performance space. In our experiments, we adopt a two-dimensional geometric setting defined by the cross-validated $R^2$ scores of two anchor algorithms, known as landmarkers. We compare two augmentation strategies: (1) uniform sampling, which distributes synthetic datasets across the performance space; and (2) margin-based sampling, which concentrates them near the decision boundary where landmarker preference is most ambiguous. Across 42 real-world UCI regression datasets and 730 synthetic datasets, both strategies substantially improve meta-learner performance over the unaugmented baseline under regression and multi-label evaluation formulations. However, uniform augmentation consistently outperforms margin-based augmentation, achieving a 17.47% relative reduction in Hamming loss, a 100.41% relative improvement in subset accuracy, and a +6.09% relative gain in pooled out-of-fold $R^2$ . These results lead us to postulate a central thesis: the performance of algorithms resides on a low-dimensional performance manifold, whose reconstruction bias may be minimised by user-guided LLMs that seek to maximise uniform $\epsilon$ -cover, and consequently, lead to improved meta-learning for algorithm selection.

Keywords

data augmentation instruction tuning large language model training

Cite

@article{arxiv.2605.09518,
  title  = {LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection},
  author = {Darren Zhu and Daren Ler},
  journal= {arXiv preprint arXiv:2605.09518},
  year   = {2026}
}

LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

Abstract

Keywords

Cite

Related papers