Related papers: Unity Forests: Improving Interaction Modelling and…

Multi forests: Variable importance for multi-class outcomes

In prediction tasks with multi-class outcomes, identifying covariates specifically associated with one or more outcome classes can be important. Conventional variable importance measures (VIMs) from random forests (RFs), like permutation…

Machine Learning · Statistics 2024-09-16 Roman Hornung , Alexander Hapfelmeier

Pure interaction effects unseen by Random Forests

Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during…

Machine Learning · Statistics 2025-08-04 Ricardo Blum , Munir Hiabu , Enno Mammen , Joseph Theo Meyer

Forest Floor Visualizations of Random Forests

We propose a novel methodology, forest floor, to visualize and interpret random forest (RF) models. RF is a popular and useful tool for non-linear multi-variate classification and regression, which yields a good trade-off between robustness…

Machine Learning · Statistics 2016-07-05 Soeren H. Welling , Hanne H. F. Refsgaard , Per B. Brockhoff , Line H. Clemmensen

Random Planted Forest: a directly interpretable tree ensemble

We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components…

Machine Learning · Statistics 2023-08-04 Munir Hiabu , Enno Mammen , Joseph T. Meyer

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making…

Statistics Theory · Mathematics 2021-12-28 Erwan Scornet

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search…

Machine Learning · Computer Science 2015-06-26 Mohammad Norouzi , Maxwell D. Collins , David J. Fleet , Pushmeet Kohli

On Extreme Pruning of Random Forest Ensembles for Real-time Predictive Applications

Random Forest (RF) is an ensemble supervised machine learning technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe…

Machine Learning · Computer Science 2015-03-18 Khaled Fawagreh , Mohamad Medhat Gaber , Eyad Elyan

A Random Interaction Forest for Prioritizing Predictive Biomarkers

Precision medicine is becoming a focus in medical research recently, as its implementation brings values to all stakeholders in the healthcare system. Various statistical methodologies have been developed tackling problems in different…

Quantitative Methods · Quantitative Biology 2019-10-07 Zhen Zeng , Yuefeng Lu , Judong Shen , Wei Zheng , Peter Shaw , Mary Beth Dorr

Iterative Random Forests to detect predictive and stable high-order interactions

Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo…

Machine Learning · Statistics 2022-06-08 Sumanta Basu , Karl Kumbier , James B. Brown , Bin Yu

A Central Limit Theorem for the permutation importance measure

Random Forests have become a widely used tool in machine learning since their introduction in 2001, known for their strong performance in classification and regression tasks. One key feature of Random Forests is the Random Forest…

Statistics Theory · Mathematics 2025-12-18 Nico Föge , Lena Schmid , Marc Ditzhaus , Markus Pauly

Diversity Conscious Refined Random Forest

Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and…

Machine Learning · Computer Science 2025-07-08 Sijan Bhattarai , Saurav Bhandari , Girija Bhusal , Saroj Shakya , Tapendra Pandey

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Provable Recovery of Locally Important Signed Features and Interactions from Random Forest

Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In many domains, such as personalized medicine, local…

Machine Learning · Statistics 2025-12-15 Kata Vuk , Nicolas Alexander Ihlo , Merle Behr

An Approximation Method for Fitted Random Forests

Random Forests (RF) is a popular machine learning method for classification and regression problems. It involves a bagging application to decision tree models. One of the primary advantages of the Random Forests model is the reduction in…

Machine Learning · Statistics 2022-07-06 Sai K Popuri

Generalized Random Forests using Fixed-Point Trees

We propose a computationally efficient alternative to generalized random forests (GRFs) for estimating heterogeneous effects in large dimensions. While GRFs rely on a gradient-based splitting criterion, which in large dimensions is…

Machine Learning · Statistics 2025-06-18 David Fleischer , David A. Stephens , Archer Y. Yang

Targeting predictors in random forest regression

Random forest regression (RF) is an extremely popular tool for the analysis of high-dimensional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting)…

Econometrics · Economics 2020-11-09 Daniel Borup , Bent Jesper Christensen , Nicolaj Nørgaard Mühlbach , Mikkel Slot Nielsen

Fr\'echet random forests for metric space valued regression with non euclidean predictors

Random forests are a statistical learning method widely used in many areas of scientific research because of its ability to learn complex relationships between input and output variables and also its capacity to handle high-dimensional…

Machine Learning · Statistics 2024-02-19 Louis Capitaine , Jérémie Bigot , Rodolphe Thiébaut , Robin Genuer

Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift

We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence. The leaf-wise predictions for each decision tree making up clustered random forests…

Methodology · Statistics 2026-01-26 Elliot H. Young , Peter Bühlmann

Principled Federated Random Forests for Heterogeneous Data

Random Forests (RF) are among the most powerful and widely used predictive models for centralized tabular data, yet few methods exist to adapt them to the federated learning setting. Unlike most federated learning approaches, the…

Machine Learning · Statistics 2026-05-08 Rémi Khellaf , Erwan Scornet , Aurélien Bellet , Julie Josse

Joints in Random Forests

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack…

Machine Learning · Computer Science 2020-11-20 Alvaro H. C. Correia , Robert Peharz , Cassio de Campos