Related papers: Dimension Reduction Forests: Local Variable Import…

MMD-based Variable Importance for Distributional Random Forest

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…

Machine Learning · Statistics 2024-02-15 Clément Bénard , Jeffrey Näf , Julie Josse

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…

Machine Learning · Statistics 2022-10-13 Domagoj Ćevid , Loris Michel , Jeffrey Näf , Nicolai Meinshausen , Peter Bühlmann

Local Linear Forests

Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random…

Machine Learning · Statistics 2020-09-08 Rina Friedberg , Julie Tibshirani , Susan Athey , Stefan Wager

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making…

Statistics Theory · Mathematics 2021-12-28 Erwan Scornet

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and…

Machine Learning · Statistics 2015-06-04 Gilles Louppe

Generalized Random Forests

We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment…

Methodology · Statistics 2018-04-06 Susan Athey , Julie Tibshirani , Stefan Wager

From global to local MDI variable importances for random forests and when they are Shapley values

Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods…

Machine Learning · Statistics 2021-11-04 Antonio Sutera , Gilles Louppe , Van Anh Huynh-Thu , Louis Wehenkel , Pierre Geurts

(f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data

Random forests is a common non-parametric regression technique which performs well for mixed-type unordered data and irrelevant features, while being robust to monotonic variable transformations. Standard random forests, however, do not…

Computation · Statistics 2019-06-19 Taylor Pospisil , Ann B. Lee

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature…

Machine Learning · Computer Science 2022-01-19 Xiaojun Mao , Liuhua Peng , Zhonglei Wang

Variable importance for causal forests: breaking down the heterogeneity of treatment effects

Causal random forests provide efficient estimates of heterogeneous treatment effects. However, forest algorithms are also well-known for their black-box nature, and therefore, do not characterize how input variables are involved in…

Machine Learning · Statistics 2023-08-08 Clément Bénard , Julie Josse

Interpreting Deep Forest through Feature Contribution and MDI Feature Importance

Deep forest is a non-differentiable deep model which has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer…

Machine Learning · Computer Science 2023-05-02 Yi-Xiao He , Shen-Huan Lyu , Yuan Jiang

Improving Random Forests by Smoothing

Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of…

Machine Learning · Computer Science 2026-05-19 Ziyi Liu , Phuc Luong , Mario Boley , Daniel F. Schmidt

Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem

Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…

Econometrics · Economics 2020-12-22 Mochen Yang , Edward McFowland , Gordon Burtch , Gediminas Adomavicius

Distributional Adaptive Soft Regression Trees

Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…

Methodology · Statistics 2022-10-20 Nikolaus Umlauf , Nadja Klein

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Making Sense of Random Forest Probabilities: a Kernel Perspective

A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a…

Machine Learning · Statistics 2018-12-17 Matthew A. Olson , Abraham J. Wyner

Nonparametric intensity estimation of spatial point processes by random forests

We propose a random forest estimator for the intensity of spatial point processes, applicable with or without covariates. It retains the well-known advantages of a random forest approach, including the ability to handle a large number of…

Methodology · Statistics 2025-11-13 Christophe Biscio , Frédéric Lavancier

An Approximation Method for Fitted Random Forests

Random Forests (RF) is a popular machine learning method for classification and regression problems. It involves a bagging application to decision tree models. One of the primary advantages of the Random Forests model is the reduction in…

Machine Learning · Statistics 2022-07-06 Sai K Popuri

Variable selection from random forests: application to gene expression data

Random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of…

Quantitative Methods · Quantitative Biology 2007-05-23 Ramon Diaz-Uriarte , Sara Alvarez de Andres

Fr\'echet random forests for metric space valued regression with non euclidean predictors

Random forests are a statistical learning method widely used in many areas of scientific research because of its ability to learn complex relationships between input and output variables and also its capacity to handle high-dimensional…

Machine Learning · Statistics 2024-02-19 Louis Capitaine , Jérémie Bigot , Rodolphe Thiébaut , Robin Genuer