English

Replica Analysis for Ensemble Techniques in Variable Selection

Statistics Theory 2025-02-27 v1 Disordered Systems and Neural Networks Statistical Mechanics Information Theory math.IT Statistics Theory

Abstract

Variable selection is a problem of statistics that aims to find the subset of the NN-dimensional possible explanatory variables that are truly related to the generation process of the response variable. In high-dimensional setups, where the input dimension NN is comparable to the data size MM, it is difficult to use classic methods based on pp-values. Therefore, methods based on the ensemble learning are often used. In this review article, we introduce how the performance of these ensemble-based methods can be systematically analyzed using the replica method from statistical mechanics when NN and MM diverge at the same rate as N,M,M/Nα(0,)N,M\to\infty, M/N\to\alpha\in(0,\infty). As a concrete application, we analyze the power of stability selection (SS) and the derandomized knockoff (dKO) with the 1\ell_1-regularized statistics in the high-dimensional linear model. The result indicates that dKO provably outperforms the vanilla knockoff and the standard SS, while increasing the bootstrap resampling rate in SS might further improve the detection power.

Keywords

Cite

@article{arxiv.2408.16799,
  title  = {Replica Analysis for Ensemble Techniques in Variable Selection},
  author = {Takashi Takahashi},
  journal= {arXiv preprint arXiv:2408.16799},
  year   = {2025}
}

Comments

25 pages, 4 figures, 1 table

R2 v1 2026-06-28T18:28:05.079Z