English

Model-free controlled variable selection via data splitting

Methodology 2024-04-23 v3

Abstract

Addressing the simultaneous identification of contributory variables while controlling the false discovery rate (FDR) in high-dimensional data is a crucial statistical challenge. In this paper, we propose a novel model-free variable selection procedure in sufficient dimension reduction framework via a data splitting technique. The variable selection problem is first converted to a least squares procedure with several response transformations. We construct a series of statistics with global symmetry property and leverage the symmetry to derive a data-driven threshold aimed at error rate control. Our approach demonstrates the capability for achieving finite-sample and asymptotic FDR control under mild theoretical conditions. Numerical experiments confirm that our procedure has satisfactory FDR control and higher power compared with existing methods.

Keywords

Cite

@article{arxiv.2210.12382,
  title  = {Model-free controlled variable selection via data splitting},
  author = {Yixin Han and Xu Guo and Changliang Zou},
  journal= {arXiv preprint arXiv:2210.12382},
  year   = {2024}
}

Comments

55 pages, 5 figures, 6 tables