Related papers: Adaptive Split Balancing for Optimal Random Forest
We study contextual stochastic optimization problems, where we leverage rich auxiliary observations (e.g., product characteristics) to improve decision making with uncertain variables (e.g., demand). We show how to train forest decision…
Random forests are a popular class of algorithms used for regression and classification. The algorithm introduced by Breiman in 2001 and many of its variants are ensembles of randomized decision trees built from axis-aligned partitions of…
We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…
In this paper, we introduce a collaborative training algorithm of balanced random forests with convolutional neural networks for domain adaptation tasks. In real scenarios, most domain adaptation algorithms face the challenges from noisy,…
Classification of functional data where observations are curves or trajectories poses unique challenges, particularly under severe class imbalance. Traditional Random Forest algorithms, while robust for tabular data, often fail to capture…
Random Forests (RF) and Extreme Gradient Boosting (XGBoost) are two of the most widely used and highly performing classification and regression models. They aggregate equally weighted CART trees, generated randomly in RF or sequentially in…
Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…
Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the…
We study predictive probability inference in classification tasks using random forests under class imbalance. We focus on two simplified variants of Breiman's algorithm, namely subsampling Infinite Random Forests (IRFs) and under-sampling…
Bayesian Additive Regression Trees [BART, Chipman et al., 2010] have gained significant popularity due to their remarkable predictive performance and ability to quantify uncertainty. However, standard decision tree models rely on recursive…
Introduced by Breiman, Random Forests are widely used classification and regression algorithms. While being initially designed as batch algorithms, several variants have been proposed to handle online learning. One particular instance of…
As a flexible nonparametric learning tool, the random forests algorithm has been widely applied to various real applications with appealing empirical performance, even in the presence of high-dimensional feature space. Unveiling the…
There are many approaches for training decision trees. This work introduces a novel gradient-based method for constructing decision trees that optimize arbitrary differentiable loss functions, overcoming the limitations of heuristic…
We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search…
We demonstrate that adaptively controlling the size of individual regression trees in a random forest can improve predictive performance, contrary to the conventional wisdom that trees should be fully grown. A fast pruning algorithm,…
Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we…
Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set…
Continual learning based on data stream mining deals with ubiquitous sources of Big Data arriving at high-velocity and in real-time. Adaptive Random Forest ({\em ARF}) is a popular ensemble method used for continual learning due to its…
Phylogenetic trees are leaf-labelled trees used to model the evolution of species. In practice it is not uncommon to obtain two topologically distinct trees for the same set of species, and this motivates the use of distance measures to…
We study the convergence of the predictive surface of regression trees and forests. To support our analysis we introduce a notion of adaptive concentration for regression trees. This approach breaks tree training into a model selection…