Related papers: Locally Optimized Random Forests
We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence. The leaf-wise predictions for each decision tree making up clustered random forests…
Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regression to the tree predictions, have been proposed in…
A random forest prediction can be computed by the scalar product of the labels of the training examples and a set of weights that are determined by the leafs of the forest into which the test object falls; each prediction can hence be…
Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random…
In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on…
Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…
Random forests is a state-of-the-art supervised machine learning method which behaves well in high-dimensional settings although some limitations may happen when $p$, the number of predictors, is much larger than the number of observations…
We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…
Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…
Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the…
Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…
This paper presents a brand new nonparametric density estimation strategy named the best-scored random forest density estimation whose effectiveness is supported by both solid theoretical analysis and significant experimental performance.…
Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of…
We describe the use of an unsupervised Random Forest for similarity learning and improved unsupervised anomaly detection. By training a Random Forest to discriminate between real data and synthetic data sampled from a uniform distribution…
Regression models for supervised learning problems with a continuous target are commonly understood as models for the conditional mean of the target given predictors. This notion is simple and therefore appealing for interpretation and…
Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical…
When using machine learning for imbalanced binary classification problems, it is common to subsample the majority class to create a (more) balanced training dataset. This biases the model's predictions because the model learns from data…
The random forest (RF) algorithm has become a very popular prediction method for its great flexibility and promising accuracy. In RF, it is conventional to put equal weights on all the base learners (trees) to aggregate their predictions.…
Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees, have been successfully used for regression in many applications and research studies. Furthermore, these methods have been extended in order to deal with uncertainty…
Most machine learning methods assume that the input data distribution is the same in the training and testing phases. However, in practice, this stationarity is usually not met and the distribution of inputs differs, leading to unexpected…