English

A general framework for inference on algorithm-agnostic variable importance

Methodology 2025-10-23 v2 Statistics Theory Machine Learning Statistics Theory

Abstract

In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.

Keywords

Cite

@article{arxiv.2004.03683,
  title  = {A general framework for inference on algorithm-agnostic variable importance},
  author = {Brian D. Williamson and Peter B. Gilbert and Noah R. Simon and Marco Carone},
  journal= {arXiv preprint arXiv:2004.03683},
  year   = {2025}
}

Comments

69 total pages (35 in the main document, 34 supplementary), 23 figures (4 in the main document, 19 supplementary)