Related papers: Grouped variable importance with random forests an…

Variable importance for causal forests: breaking down the heterogeneity of treatment effects

Causal random forests provide efficient estimates of heterogeneous treatment effects. However, forest algorithms are also well-known for their black-box nature, and therefore, do not characterize how input variables are involved in…

Machine Learning · Statistics 2023-08-08 Clément Bénard , Julie Josse

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and…

Machine Learning · Statistics 2015-06-04 Gilles Louppe

Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem

Random Forest is a machine learning method that offers many advantages, including the ability to easily measure variable importance. Class balancing technique is a well-known solution to deal with class imbalance problem. However, it has…

Machine Learning · Statistics 2023-12-19 Yunbi Nam , Sunwoo Han

Correlation and variable importance in random forests

This paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more…

Methodology · Statistics 2016-04-19 Baptiste Gregorutti , Bertrand Michel , Philippe Saint-Pierre

Enhancing Variable Importance in Random Forests: A Novel Application of Global Sensitivity Analysis

The present work provides an application of Global Sensitivity Analysis to supervised machine learning methods such as Random Forests. These methods act as black boxes, selecting features in high--dimensional data sets as to provide…

Machine Learning · Statistics 2024-07-22 Giulia Vannucci , Roberta Siciliano , Andrea Saltelli

Model-assisted estimation through random forests in finite population sampling

In surveys, the interest lies in estimating finite population parameters such as population totals and means. In most surveys, some auxiliary information is available at the estimation stage. This information may be incorporated in the…

Methodology · Statistics 2022-08-23 Mehdi Dagdoug , Camelia Goga , David Haziza

MMD-based Variable Importance for Distributional Random Forest

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…

Machine Learning · Statistics 2024-02-15 Clément Bénard , Jeffrey Näf , Julie Josse

Random Forests: some methodological insights

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse,…

Machine Learning · Statistics 2008-11-24 Robin Genuer , Jean-Michel Poggi , Christine Tuleau

Prediction Error Reduction Function as a Variable Importance Score

This paper introduces and develops a novel variable importance score function in the context of ensemble learning and demonstrates its appeal both theoretically and empirically. Our proposed score function is simple and more straightforward…

Machine Learning · Statistics 2015-01-27 Ernest Fokoué

Combining clustering of variables and feature selection using random forests

Standard approaches to tackle high-dimensional supervised classification problem often include variable selection and dimension reduction procedures. The novel methodology proposed in this paper combines clustering of variables and feature…

Statistics Theory · Mathematics 2018-11-07 Marie Chavent , Robin Genuer , Jerome Saracco

Variable selection from random forests: application to gene expression data

Random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of…

Quantitative Methods · Quantitative Biology 2007-05-23 Ramon Diaz-Uriarte , Sara Alvarez de Andres

The All Relevant Feature Selection using Random Forest

In this paper we examine the application of the random forest classifier for the all relevant feature selection problem. To this end we first examine two recently proposed all relevant feature selection algorithms, both being a random…

Artificial Intelligence · Computer Science 2011-06-28 Miron B. Kursa , Witold R. Rudnicki

A Random Forest Guided Tour

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their…

Statistics Theory · Mathematics 2015-11-19 Gérard Biau , Erwan Scornet

Dimension Reduction Forests: Local Variable Importance using Structured Random Forests

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such…

Methodology · Statistics 2021-03-25 Joshua Daniel Loyal , Ruoqing Zhu , Yifan Cui , Xin Zhang

Testing for Feature Relevance: The HARVEST Algorithm

Feature selection with high-dimensional data and a very small proportion of relevant features poses a severe challenge to standard statistical methods. We have developed a new approach (HARVEST) that is straightforward to apply, albeit…

Machine Learning · Statistics 2018-03-01 Herbert Weisberg , Victor Pontes , Mathis Thoma

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making…

Statistics Theory · Mathematics 2021-12-28 Erwan Scornet

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…

Machine Learning · Statistics 2022-10-13 Domagoj Ćevid , Loris Michel , Jeffrey Näf , Nicolai Meinshausen , Peter Bühlmann

Grouping Shapley Value Feature Importances of Random Forests for explainable Yield Prediction

Explainability in yield prediction helps us fully explore the potential of machine learning models that are already able to achieve high accuracy for a variety of yield prediction scenarios. The data included for the prediction of yields…

Machine Learning · Computer Science 2023-04-17 Florian Huber , Hannes Engler , Anna Kicherer , Katja Herzog , Reinhard Töpfer , Volker Steinhage

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature…

Machine Learning · Computer Science 2022-01-19 Xiaojun Mao , Liuhua Peng , Zhonglei Wang