Related papers: Locally Optimized Random Forests

Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift

We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence. The leaf-wise predictions for each decision tree making up clustered random forests…

Methodology · Statistics 2026-01-26 Elliot H. Young , Peter Bühlmann

Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regression to the tree predictions, have been proposed in…

Machine Learning · Statistics 2025-11-13 Jing Shang , James Bannon , Benjamin Haibe-Kains , Robert Tibshirani

Example-Based Explanations of Random Forest Predictions

A random forest prediction can be computed by the scalar product of the labels of the training examples and a set of weights that are determined by the leafs of the forest into which the test object falls; each prediction can hence be…

Machine Learning · Computer Science 2023-11-27 Henrik Boström

Local Linear Forests

Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random…

Machine Learning · Statistics 2020-09-08 Rina Friedberg , Julie Tibshirani , Susan Athey , Stefan Wager

Localized Uncertainty Quantification in Random Forests via Proximities

In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on…

Machine Learning · Statistics 2025-09-30 Jake S. Rhodes , Scott D. Brown , J. Riley Wilkinson

Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem

Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…

Econometrics · Economics 2020-12-22 Mochen Yang , Edward McFowland , Gordon Burtch , Gediminas Adomavicius

Random forests for high-dimensional longitudinal data

Random forests is a state-of-the-art supervised machine learning method which behaves well in high-dimensional settings although some limitations may happen when $p$, the number of predictors, is much larger than the number of observations…

Methodology · Statistics 2019-02-01 Louis Capitaine , Robin Genuer , Rodolphe Thiébaut

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…

Machine Learning · Statistics 2019-05-28 Hanyuan Hang , Xiaoyu Liu , Ingo Steinwart

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…

Machine Learning · Statistics 2022-10-13 Domagoj Ćevid , Loris Michel , Jeffrey Näf , Nicolai Meinshausen , Peter Bühlmann

A Random Forest Approach for Modeling Bounded Outcomes

Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the…

Methodology · Statistics 2019-01-21 Leonie Weinhold , Matthias Schmid , Marvin N. Wright , Moritz Berger

Distributional Adaptive Soft Regression Trees

Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…

Methodology · Statistics 2022-10-20 Nikolaus Umlauf , Nadja Klein

Best-scored Random Forest Density Estimation

This paper presents a brand new nonparametric density estimation strategy named the best-scored random forest density estimation whose effectiveness is supported by both solid theoretical analysis and significant experimental performance.…

Machine Learning · Statistics 2019-05-10 Hanyuan Hang , Hongwei Wen

Improving Random Forests by Smoothing

Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of…

Machine Learning · Computer Science 2026-05-19 Ziyi Liu , Phuc Luong , Mario Boley , Daniel F. Schmidt

Explainable Unsupervised Anomaly Detection with Random Forest

We describe the use of an unsupervised Random Forest for similarity learning and improved unsupervised anomaly detection. By training a Random Forest to discriminate between real data and synthetic data sampled from a uniform distribution…

Machine Learning · Statistics 2025-04-23 Joshua S. Harvey , Joshua Rosaler , Mingshu Li , Dhruv Desai , Dhagash Mehta

Transformation Forests

Regression models for supervised learning problems with a continuous target are commonly understood as models for the conditional mean of the target given predictors. This notion is simple and therefore appealing for interpretation and…

Methodology · Statistics 2018-01-09 Torsten Hothorn , Achim Zeileis

Consistency of random forests

Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical…

Statistics Theory · Mathematics 2015-08-11 Erwan Scornet , Gérard Biau , Jean-Philippe Vert

Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

When using machine learning for imbalanced binary classification problems, it is common to subsample the majority class to create a (more) balanced training dataset. This biases the model's predictions because the model learns from data…

Machine Learning · Computer Science 2025-11-03 Nathan Phelps , Daniel J. Lizotte , Douglas G. Woolford

Optimal Weighted Random Forests

The random forest (RF) algorithm has become a very popular prediction method for its great flexibility and promising accuracy. In RF, it is conventional to put equal weights on all the base learners (trees) to aggregate their predictions.…

Machine Learning · Statistics 2023-05-18 Xinyu Chen , Dalei Yu , Xinyu Zhang

Uncertain Trees: Dealing with Uncertain Inputs in Regression Trees

Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees, have been successfully used for regression in many applications and research studies. Furthermore, these methods have been extended in order to deal with uncertainty…

Machine Learning · Computer Science 2018-11-20 Myriam Tami , Marianne Clausel , Emilie Devijver , Adrien Dulac , Eric Gaussier , Stefan Janaqi , Meriam Chebre

A Domain-Region Based Evaluation of ML Performance Robustness to Covariate Shift

Most machine learning methods assume that the input data distribution is the same in the training and testing phases. However, in practice, this stationarity is usually not met and the distribution of inputs differs, leading to unexpected…

Machine Learning · Computer Science 2023-04-19 Firas Bayram , Bestoun S. Ahmed