Related papers: Correlation and variable importance in random fore…

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between…

Machine Learning · Statistics 2019-12-10 Burim Ramosaj , Markus Pauly

Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem

Random Forest is a machine learning method that offers many advantages, including the ability to easily measure variable importance. Class balancing technique is a well-known solution to deal with class imbalance problem. However, it has…

Machine Learning · Statistics 2023-12-19 Yunbi Nam , Sunwoo Han

Challenges in Variable Importance Ranking Under Correlation

Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation…

Machine Learning · Statistics 2024-02-07 Annie Liang , Thomas Jemielita , Andy Liaw , Vladimir Svetnik , Lingkang Huang , Richard Baumgartner , Jason M. Klusowski

Prediction Error Reduction Function as a Variable Importance Score

This paper introduces and develops a novel variable importance score function in the context of ensemble learning and demonstrates its appeal both theoretically and empirically. Our proposed score function is simple and more straightforward…

Machine Learning · Statistics 2015-01-27 Ernest Fokoué

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Variable importance for causal forests: breaking down the heterogeneity of treatment effects

Causal random forests provide efficient estimates of heterogeneous treatment effects. However, forest algorithms are also well-known for their black-box nature, and therefore, do not characterize how input variables are involved in…

Machine Learning · Statistics 2023-08-08 Clément Bénard , Julie Josse

Grouped variable importance with random forests and application to multiple functional data analysis

The selection of grouped variables using the random forest algorithm is considered. First a new importance measure adapted for groups of variables is proposed. Theoretical insights into this criterion are given for additive regression…

Methodology · Statistics 2015-05-20 Baptiste Gregorutti , Bertrand Michel , Philippe Saint-Pierre

The All Relevant Feature Selection using Random Forest

In this paper we examine the application of the random forest classifier for the all relevant feature selection problem. To this end we first examine two recently proposed all relevant feature selection algorithms, both being a random…

Artificial Intelligence · Computer Science 2011-06-28 Miron B. Kursa , Witold R. Rudnicki

MMD-based Variable Importance for Distributional Random Forest

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…

Machine Learning · Statistics 2024-02-15 Clément Bénard , Jeffrey Näf , Julie Josse

Context-dependent feature analysis with random forests

In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that…

Machine Learning · Statistics 2016-05-13 Antonio Sutera , Gilles Louppe , Vân Anh Huynh-Thu , Louis Wehenkel , Pierre Geurts

Dimension Reduction Forests: Local Variable Importance using Structured Random Forests

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such…

Methodology · Statistics 2021-03-25 Joshua Daniel Loyal , Ruoqing Zhu , Yifan Cui , Xin Zhang

Variable selection from random forests: application to gene expression data

Random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of…

Quantitative Methods · Quantitative Biology 2007-05-23 Ramon Diaz-Uriarte , Sara Alvarez de Andres

A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression

In many studies, we want to determine the influence of certain features on a dependent variable. More specifically, we are interested in the strength of the influence -- i.e., is the feature relevant? -- and, if so, how the feature…

Machine Learning · Statistics 2023-03-03 Yannick Gerstorfer , Lena Krieg , Max Hahn-Klimroth

Variable Selection Using Bayesian Additive Regression Trees

Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive…

Methodology · Statistics 2021-12-30 Chuji Luo , Michael J. Daniels

Effect of hyperparameters on variable selection in random forests

Random forests (RFs) are well suited for prediction modeling and variable selection in high-dimensional omics studies. The effect of hyperparameters of the RF algorithm on prediction performance and variable importance estimation have…

Machine Learning · Statistics 2025-01-28 Cesaire J. K. Fouodo , Lea L. Kronziel , Inke R. König , Silke Szymczak

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making…

Statistics Theory · Mathematics 2021-12-28 Erwan Scornet

Consistent Estimation of Residual Variance with Random Forest Out-Of-Bag Errors

The issue of estimating residual variance in regression models has experienced relatively little attention in the machine learning community. However, the estimate is of primary interest in many practical applications, e.g. as a primary…

Statistics Theory · Mathematics 2018-12-18 Burim Ramosaj , Markus Pauly

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and…

Machine Learning · Statistics 2015-06-04 Gilles Louppe

Better Model Selection with a new Definition of Feature Importance

Feature importance aims at measuring how crucial each input feature is for model prediction. It is widely used in feature engineering, model selection and explainable artificial intelligence (XAI). In this paper, we propose a new tree-model…

Machine Learning · Statistics 2020-09-17 Fan Fang , Carmine Ventre , Lingbo Li , Leslie Kanthan , Fan Wu , Michail Basios

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously

Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model $f(\mathbf{x})=\mathbf{x}^{T}\beta$ with a…

Methodology · Statistics 2019-12-24 Aaron Fisher , Cynthia Rudin , Francesca Dominici