Related papers: Unbiased variable importance for random forests

From unbiased MDI Feature Importance to Explainable AI for Trees

We attempt to give a unifying view of the various recent attempts to (i) improve the interpretability of tree-based models and (ii) debias the the default variable-importance measure in random Forests, Gini importance. In particular, we…

Machine Learning · Statistics 2021-10-01 Markus Loecher

Unbiased Measurement of Feature Importance in Tree-Based Methods

We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more…

Machine Learning · Statistics 2020-03-25 Zhengze Zhou , Giles Hooker

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between…

Machine Learning · Statistics 2019-12-10 Burim Ramosaj , Markus Pauly

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Value-aware Importance Weighting for Off-policy Reinforcement Learning

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However,…

Machine Learning · Computer Science 2023-06-28 Kristopher De Asis , Eric Graves , Richard S. Sutton

Prediction Error Reduction Function as a Variable Importance Score

This paper introduces and develops a novel variable importance score function in the context of ensemble learning and demonstrates its appeal both theoretically and empirically. Our proposed score function is simple and more straightforward…

Machine Learning · Statistics 2015-01-27 Ernest Fokoué

A Debiased MDI Feature Importance Measure for Random Forests

Tree ensembles such as Random Forests have achieved impressive empirical success across a wide variety of applications. To understand how these models make predictions, people routinely turn to feature importance measures calculated from…

Machine Learning · Statistics 2019-10-29 Xiao Li , Yu Wang , Sumanta Basu , Karl Kumbier , Bin Yu

A Central Limit Theorem for the permutation importance measure

Random Forests have become a widely used tool in machine learning since their introduction in 2001, known for their strong performance in classification and regression tasks. One key feature of Random Forests is the Random Forest…

Statistics Theory · Mathematics 2025-12-18 Nico Föge , Lena Schmid , Marc Ditzhaus , Markus Pauly

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making…

Statistics Theory · Mathematics 2021-12-28 Erwan Scornet

Cost-complexity pruning of random forests

Random forests perform bootstrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in using the…

Machine Learning · Statistics 2017-07-20 Kiran Bangalore Ravi , Jean Serra

Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem

Random Forest is a machine learning method that offers many advantages, including the ability to easily measure variable importance. Class balancing technique is a well-known solution to deal with class imbalance problem. However, it has…

Machine Learning · Statistics 2023-12-19 Yunbi Nam , Sunwoo Han

Multi forests: Variable importance for multi-class outcomes

In prediction tasks with multi-class outcomes, identifying covariates specifically associated with one or more outcome classes can be important. Conventional variable importance measures (VIMs) from random forests (RFs), like permutation…

Machine Learning · Statistics 2024-09-16 Roman Hornung , Alexander Hapfelmeier

Confidence Intervals for Random Forest Permutation Importance with Missing Data

Random Forests are renowned for their predictive accuracy, but valid inference, particularly about permutation-based feature importances, remains challenging. Existing methods, such as the confidence intervals (CIs) from Ishwaran et al.…

Methodology · Statistics 2025-07-21 Nico Föge , Markus Pauly

Consistent Estimation of Residual Variance with Random Forest Out-Of-Bag Errors

The issue of estimating residual variance in regression models has experienced relatively little attention in the machine learning community. However, the estimate is of primary interest in many practical applications, e.g. as a primary…

Statistics Theory · Mathematics 2018-12-18 Burim Ramosaj , Markus Pauly

TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores

Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex…

Machine Learning · Computer Science 2025-07-11 Aaron Foote , Danny Krizanc

Grouped variable importance with random forests and application to multiple functional data analysis

The selection of grouped variables using the random forest algorithm is considered. First a new importance measure adapted for groups of variables is proposed. Theoretical insights into this criterion are given for additive regression…

Methodology · Statistics 2015-05-20 Baptiste Gregorutti , Bertrand Michel , Philippe Saint-Pierre

Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting

A learned generative model often produces biased statistics relative to the underlying data distribution. A standard technique to correct this bias is importance sampling, where samples from the model are weighted by the likelihood ratio…

Machine Learning · Statistics 2019-11-05 Aditya Grover , Jiaming Song , Alekh Agarwal , Kenneth Tran , Ashish Kapoor , Eric Horvitz , Stefano Ermon

From global to local MDI variable importances for random forests and when they are Shapley values

Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods…

Machine Learning · Statistics 2021-11-04 Antonio Sutera , Gilles Louppe , Van Anh Huynh-Thu , Louis Wehenkel , Pierre Geurts

Statistically Valid Variable Importance Assessment through Conditional Permutations

Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference…

Machine Learning · Computer Science 2023-10-27 Ahmad Chamma , Denis A. Engemann , Bertrand Thirion

Correlation and variable importance in random forests

This paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more…

Methodology · Statistics 2016-04-19 Baptiste Gregorutti , Bertrand Michel , Philippe Saint-Pierre