Model-free controlled variable selection via data splitting
Abstract
Addressing the simultaneous identification of contributory variables while controlling the false discovery rate (FDR) in high-dimensional data is a crucial statistical challenge. In this paper, we propose a novel model-free variable selection procedure in sufficient dimension reduction framework via a data splitting technique. The variable selection problem is first converted to a least squares procedure with several response transformations. We construct a series of statistics with global symmetry property and leverage the symmetry to derive a data-driven threshold aimed at error rate control. Our approach demonstrates the capability for achieving finite-sample and asymptotic FDR control under mild theoretical conditions. Numerical experiments confirm that our procedure has satisfactory FDR control and higher power compared with existing methods.
Keywords
Cite
@article{arxiv.2210.12382,
title = {Model-free controlled variable selection via data splitting},
author = {Yixin Han and Xu Guo and Changliang Zou},
journal= {arXiv preprint arXiv:2210.12382},
year = {2024}
}
Comments
55 pages, 5 figures, 6 tables