Related papers: Random Forests for Big Data

Consistency of random forests

Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical…

Statistics Theory · Mathematics 2015-08-11 Erwan Scornet , Gérard Biau , Jean-Philippe Vert

A Random Forest Guided Tour

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their…

Statistics Theory · Mathematics 2015-11-19 Gérard Biau , Erwan Scornet

Random Forests: some methodological insights

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse,…

Machine Learning · Statistics 2008-11-24 Robin Genuer , Jean-Michel Poggi , Christine Tuleau

Analysis of a Random Forests Model

Random forests are a scheme proposed by Leo Breiman in the 2000's for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there has been…

Machine Learning · Statistics 2012-03-28 Gérard Biau

Random forests for high-dimensional longitudinal data

Random forests is a state-of-the-art supervised machine learning method which behaves well in high-dimensional settings although some limitations may happen when $p$, the number of predictors, is much larger than the number of observations…

Methodology · Statistics 2019-02-01 Louis Capitaine , Robin Genuer , Rodolphe Thiébaut

Neural Random Forests

Given an ensemble of randomized regression trees, it is possible to restructure them as a collection of multilayered neural networks with particular connection weights. Following this principle, we reformulate the random forest method of…

Machine Learning · Statistics 2018-04-04 Gérard Biau , Erwan Scornet , Johannes Welbl

Random Similarity Forests

The wealth of data being gathered about humans and their surroundings drives new machine learning applications in various fields. Consequently, more and more often, classifiers are trained using not only numerical data but also complex data…

Machine Learning · Computer Science 2022-04-13 Maciej Piernik , Dariusz Brzezinski , Pawel Zawadzki

Large Random Forests: Optimisation for Rapid Evaluation

Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with…

Machine Learning · Computer Science 2019-12-24 Frederik Gossen , Bernhard Steffen

Arbres CART et For\^ets al\'eatoires, Importance et s\'election de variables

Two algorithms proposed by Leo Breiman : CART trees (Classification And Regression Trees for) introduced in the first half of the 80s and random forests emerged, meanwhile, in the early 2000s, are the subject of this article. The goal is to…

Methodology · Statistics 2017-01-23 Robin Genuer , Jean-Michel Poggi

ggRandomForests: Visually Exploring a Random Forest for Regression

Random Forests [Breiman:2001] (RF) are a fully non-parametric statistical method requiring no distributional assumptions on covariate relation to the response. RF are a robust, nonlinear technique that optimizes predictive accuracy by…

Computation · Statistics 2016-12-30 John Ehrlinger

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…

Machine Learning · Statistics 2022-10-13 Domagoj Ćevid , Loris Michel , Jeffrey Näf , Nicolai Meinshausen , Peter Bühlmann

A Random Forest Approach for Modeling Bounded Outcomes

Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the…

Methodology · Statistics 2019-01-21 Leonie Weinhold , Matthias Schmid , Marvin N. Wright , Moritz Berger

Fr\'echet random forests for metric space valued regression with non euclidean predictors

Random forests are a statistical learning method widely used in many areas of scientific research because of its ability to learn complex relationships between input and output variables and also its capacity to handle high-dimensional…

Machine Learning · Statistics 2024-02-19 Louis Capitaine , Jérémie Bigot , Rodolphe Thiébaut , Robin Genuer

Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regression to the tree predictions, have been proposed in…

Machine Learning · Statistics 2025-11-13 Jing Shang , James Bannon , Benjamin Haibe-Kains , Robert Tibshirani

Mondrian Forests: Efficient Online Random Forests

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are…

Machine Learning · Statistics 2015-02-17 Balaji Lakshminarayanan , Daniel M. Roy , Yee Whye Teh

Impact of subsampling and pruning on random forests

Random forests are ensemble learning methods introduced by Breiman (2001) that operate by averaging several decision trees built on a randomly selected subspace of the data set. Despite their widespread use in practice, the respective roles…

Statistics Theory · Mathematics 2016-03-15 Roxane Duroux , Erwan Scornet

Models under which random forests perform badly; consequences for applications

We give examples of data-generating models under which Breiman's random forest may be extremely slow to converge to the optimal predictor or even fail to be consistent. The evidence provided for these properties is based on mostly intuitive…

Machine Learning · Statistics 2021-12-01 José A. Ferreira

Improved Weighted Random Forest for Classification Problems

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of…

Machine Learning · Computer Science 2021-03-01 Mohsen Shahhosseini , Guiping Hu

Distributional Adaptive Soft Regression Trees

Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…

Methodology · Statistics 2022-10-20 Nikolaus Umlauf , Nadja Klein

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and…

Machine Learning · Statistics 2015-06-04 Gilles Louppe