English

Topic Modelling Black Box Optimization

Machine Learning 2025-12-19 v1 Artificial Intelligence Computation and Language Neural and Evolutionary Computing

Abstract

Choosing the number of topics TT in Latent Dirichlet Allocation (LDA) is a key design decision that strongly affects both the statistical fit and interpretability of topic models. In this work, we formulate the selection of TT as a discrete black-box optimization problem, where each function evaluation corresponds to training an LDA model and measuring its validation perplexity. Under a fixed evaluation budget, we compare four families of optimizers: two hand-designed evolutionary methods - Genetic Algorithm (GA) and Evolution Strategy (ES) - and two learned, amortized approaches, Preferential Amortized Black-Box Optimization (PABBO) and Sharpness-Aware Black-Box Optimization (SABBO). Our experiments show that, while GA, ES, PABBO, and SABBO eventually reach a similar band of final perplexity, the amortized optimizers are substantially more sample- and time-efficient. SABBO typically identifies a near-optimal topic number after essentially a single evaluation, and PABBO finds competitive configurations within a few evaluations, whereas GA and ES require almost the full budget to approach the same region.

Keywords

Cite

@article{arxiv.2512.16445,
  title  = {Topic Modelling Black Box Optimization},
  author = {Roman Akramov and Artem Khamatullin and Svetlana Glazyrina and Maksim Kryzhanovskiy and Roman Ischenko},
  journal= {arXiv preprint arXiv:2512.16445},
  year   = {2025}
}
R2 v1 2026-07-01T08:31:14.326Z