Related papers: Multi forests: Variable importance for multi-class…

Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests

Random forests (RFs) are widely used for prediction and variable importance analysis and are often believed to capture any types of interactions via recursive splitting. However, since the splits are chosen locally, interactions are only…

Methodology · Statistics 2026-01-13 Roman Hornung , Alexander Hapfelmeier

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem

Random Forest is a machine learning method that offers many advantages, including the ability to easily measure variable importance. Class balancing technique is a well-known solution to deal with class imbalance problem. However, it has…

Machine Learning · Statistics 2023-12-19 Yunbi Nam , Sunwoo Han

MMD-based Variable Importance for Distributional Random Forest

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…

Machine Learning · Statistics 2024-02-15 Clément Bénard , Jeffrey Näf , Julie Josse

Opening the random forest black box by the analysis of the mutual impact of features

Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships…

Machine Learning · Computer Science 2023-08-07 Lucas F. Voges , Lukas C. Jarren , Stephan Seifert

Covariance regression with random forests

Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method…

Methodology · Statistics 2023-05-12 Cansu Alakus , Denis Larocque , Aurelie Labbe

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously

Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model $f(\mathbf{x})=\mathbf{x}^{T}\beta$ with a…

Methodology · Statistics 2019-12-24 Aaron Fisher , Cynthia Rudin , Francesca Dominici

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making…

Statistics Theory · Mathematics 2021-12-28 Erwan Scornet

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search…

Machine Learning · Computer Science 2015-06-26 Mohammad Norouzi , Maxwell D. Collins , David J. Fleet , Pushmeet Kohli

Diversity Conscious Refined Random Forest

Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and…

Machine Learning · Computer Science 2025-07-08 Sijan Bhattarai , Saurav Bhandari , Girija Bhusal , Saroj Shakya , Tapendra Pandey

A Mathematical Programming Approach to Optimal Classification Forests

This paper introduces Weighted Optimal Classification Forests (WOCFs), a new family of classifiers that takes advantage of an optimal ensemble of decision trees to derive accurate and interpretable classifiers. We propose a novel…

Optimization and Control · Mathematics 2024-12-02 Víctor Blanco , Alberto Japón , Justo Puerto , Peter Zhang

Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?

Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods. In particular, they are used for predicting univariate responses. In case of multiple outputs the question arises whether we separately…

Machine Learning · Statistics 2022-01-17 Lena Schmid , Alexander Gerharz , Andreas Groll , Markus Pauly

Correlation and variable importance in random forests

This paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more…

Methodology · Statistics 2016-04-19 Baptiste Gregorutti , Bertrand Michel , Philippe Saint-Pierre

Variable importance for causal forests: breaking down the heterogeneity of treatment effects

Causal random forests provide efficient estimates of heterogeneous treatment effects. However, forest algorithms are also well-known for their black-box nature, and therefore, do not characterize how input variables are involved in…

Machine Learning · Statistics 2023-08-08 Clément Bénard , Julie Josse

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…

Machine Learning · Statistics 2022-10-13 Domagoj Ćevid , Loris Michel , Jeffrey Näf , Nicolai Meinshausen , Peter Bühlmann

Grouped variable importance with random forests and application to multiple functional data analysis

The selection of grouped variables using the random forest algorithm is considered. First a new importance measure adapted for groups of variables is proposed. Theoretical insights into this criterion are given for additive regression…

Methodology · Statistics 2015-05-20 Baptiste Gregorutti , Bertrand Michel , Philippe Saint-Pierre

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE

A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search…

Methodology · Statistics 2019-06-26 Lisa Schlosser , Torsten Hothorn , Achim Zeileis

Integrating Random Forests and Generalized Linear Models for Improved Accuracy and Interpretability

Random forests (RFs) are among the most popular supervised learning algorithms due to their nonlinear flexibility and ease-of-use. However, as black box models, they can only be interpreted via algorithmically-defined feature importance…

Methodology · Statistics 2025-05-26 Abhineet Agarwal , Ana M. Kenney , Yan Shuo Tan , Tiffany M. Tang , Bin Yu

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state of the art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base…

Machine Learning · Computer Science 2021-09-14 Afek Ilay Adler , Amichai Painsky

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and…

Machine Learning · Statistics 2015-06-04 Gilles Louppe