Related papers: A general framework for inference on algorithm-agn…

Assessing variable importance in survival analysis using machine learning

Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant…

Methodology · Statistics 2025-03-27 Charles J. Wolock , Peter B. Gilbert , Noah Simon , Marco Carone

Nonparametric Feature Impact and Importance

Practitioners use feature importance to rank and eliminate weak predictors during model development in an effort to simplify models and improve generality. Unfortunately, they also routinely conflate such feature importance measures with…

Machine Learning · Computer Science 2020-06-09 Terence Parr , James D. Wilson , Jeff Hamrick

A principled approach for comparing Variable Importance

Variable importance measures (VIMs) aim to quantify the contribution of each input covariate to the predictability of a given output. With the growing interest in explainable AI, numerous VIMs have been proposed, many of which are heuristic…

Methodology · Statistics 2025-09-23 Angel Reyero-Lobo , Pierre Neuvial , Bertrand Thirion

Inherent Inconsistencies of Feature Importance

The rapid advancement and widespread adoption of machine learning-driven technologies have underscored the practical and ethical need for creating interpretable artificial intelligence systems. Feature importance, a method that assigns…

Machine Learning · Computer Science 2023-12-07 Nimrod Harel , Uri Obolski , Ran Gilad-Bachrach

Challenges in Variable Importance Ranking Under Correlation

Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation…

Machine Learning · Statistics 2024-02-07 Annie Liang , Thomas Jemielita , Andy Liaw , Vladimir Svetnik , Lingkang Huang , Richard Baumgartner , Jason M. Klusowski

Inference on Variable Importance for Treatment Effect Heterogeneity: Shapley Values and Beyond

We provide an inferential framework to assess variable importance for heterogeneous treatment effects. This assessment is especially useful in high-risk domains such as medicine, where decision makers hesitate to rely on black-box treatment…

Methodology · Statistics 2026-05-11 Pawel Morzywolek , Peter B. Gilbert , Alex Luedtke

Confident Feature Ranking

Machine learning models are widely applied in various fields. Stakeholders often use post-hoc feature importance methods to better understand the input features' contribution to the models' predictions. The interpretation of the importance…

Machine Learning · Statistics 2024-04-19 Bitya Neuhof , Yuval Benjamini

Inference on summaries of a model-agnostic longitudinal variable importance trajectory with application to suicide prevention

Risk of suicide attempt varies over time. Understanding the importance of risk factors measured at a mental health visit can help clinicians evaluate future risk and provide appropriate care during the visit. In prediction settings where…

Methodology · Statistics 2024-08-22 Brian D. Williamson , Erica E. M. Moodie , Gregory E. Simon , Rebecca C. Rossom , Susan M. Shortreed

Inference on function-valued parameters using a restricted score test

It is often of interest to make inference on an unknown function that is a local parameter of the data-generating mechanism, such as a density or regression function. Such estimands can typically only be estimated at a…

Methodology · Statistics 2021-05-17 Aaron Hudson , Marco Carone , Ali Shojaie

Factor Importance Ranking and Selection using Total Indices

Factor importance measures the impact of each feature on output prediction accuracy. Many existing works focus on the model-based importance, but an important feature in one learning algorithm may hold little significance in another model.…

Methodology · Statistics 2025-06-24 Chaofan Huang , V. Roshan Joseph

Generalized Permutation Framework for Testing Model Variable Significance

A common problem in machine learning is determining if a variable significantly contributes to a model's prediction performance. This problem is aggravated for datasets, such as gene expression datasets, that suffer the worst case of…

Methodology · Statistics 2023-10-13 Yue Wu , Ted Spaide , Kenji Nakamichi , Russell Van Gelder , Aaron Lee

Feature Importance Measure for Non-linear Learning Algorithms

Complex problems may require sophisticated, non-linear learning methods such as kernel machines or deep neural networks to achieve state of the art prediction accuracies. However, high prediction accuracies are not the only objective to…

Artificial Intelligence · Computer Science 2016-11-24 Marina M. -C. Vidovic , Nico Görnitz , Klaus-Robert Müller , Marius Kloft

Towards a More Reliable Interpretation of Machine Learning Outputs for Safety-Critical Systems using Feature Importance Fusion

When machine learning supports decision-making in safety-critical systems, it is important to verify and understand the reasons why a particular output is produced. Although feature importance calculation approaches assist in…

Machine Learning · Statistics 2020-09-14 Divish Rengasamy , Benjamin Rothwell , Grazziela Figueredo

Variable selection for general index models via sliced inverse regression

Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…

Methodology · Statistics 2014-09-24 Bo Jiang , Jun S. Liu

A General Framework of Nonparametric Feature Selection in High-Dimensional Data

Nonparametric feature selection in high-dimensional data is an important and challenging problem in statistics and machine learning fields. Most of the existing methods for feature selection focus on parametric or additive models which may…

Methodology · Statistics 2021-03-31 Hang Yu , Yuanjia Wang , Donglin Zeng

A general nonparametric framework for testing hypotheses about function-valued parameters

We present a general nonparametric approach for testing whether a statistical parameter defined through conditional distributions is constant across the conditioning variables. Such hypotheses arise naturally in problems such as assessing…

Methodology · Statistics 2026-04-23 Albert Osom , Ali Shojaie , Aaron Hudson

Interpretable Approximation of High-Dimensional Data

In this paper we apply the previously introduced approximation method based on the ANOVA (analysis of variance) decomposition and Grouped Transformations to synthetic and real data. The advantage of this method is the interpretability of…

Machine Learning · Statistics 2022-01-31 Daniel Potts , Michael Schmischke

Comparing Two Categorical Gini Correlations with Applications to Classification Problems

This article proposes an inferential framework for comparing predictor importance in classification problems with categorical response variables. The approach is based on the categorical Gini correlation (CGC) proposed by Dang et al.…

Methodology · Statistics 2026-05-19 Sameera Hewage , Yongli Sang

On the Asymptotics of Importance Weighted Variational Inference

For complex latent variable models, the likelihood function is not available in closed form. In this context, a popular method to perform parameter estimation is Importance Weighted Variational Inference. It essentially maximizes the…

Statistics Theory · Mathematics 2025-01-16 Badr-Eddine Cherief-Abdellatif , Randal Douc , Arnaud Doucet , Hugo Marival

Nonparametric inference on non-negative dissimilarity measures at the boundary of the parameter space

It is often of interest to assess whether a function-valued statistical parameter, such as a density function or a mean regression function, is equal to any function in a class of candidate null parameters. This can be framed as a…

Methodology · Statistics 2023-06-14 Aaron Hudson