Related papers: When does Subagging Work?

When do Random Forests work?

We study the effectiveness of randomizing split-directions in random forests. Prior literature has shown that, on the one hand, randomization can reduce variance through decorrelation, and, on the other hand, randomization regularizes and…

Machine Learning · Statistics 2025-04-18 C. Revelas , O. Boldea , B. J. M. Werker

The Effect of Heteroscedasticity on Regression Trees

Regression trees are becoming increasingly popular as omnibus predicting tools and as the basis of numerous modern statistical learning ensembles. Part of their popularity is their ability to create a regression prediction without ever…

Machine Learning · Statistics 2016-06-17 Will Ruth , Thomas Loughin

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals…

Statistics Theory · Mathematics 2021-12-14 Dimitris N. Politis

Controlling the False Split Rate in Tree-Based Aggregation

In many domains, data measurements can naturally be associated with the leaves of a tree, expressing the relationships among these measurements. For example, companies belong to industries, which in turn belong to ever coarser divisions…

Methodology · Statistics 2021-08-12 Simeng Shao , Jacob Bien , Adel Javanmard

Impact of subsampling and pruning on random forests

Random forests are ensemble learning methods introduced by Breiman (2001) that operate by averaging several decision trees built on a randomly selected subspace of the data set. Despite their widespread use in practice, the respective roles…

Statistics Theory · Mathematics 2016-03-15 Roxane Duroux , Erwan Scornet

Experiments with Optimal Model Trees

Model trees provide an appealing way to perform interpretable machine learning for both classification and regression problems. In contrast to ``classic'' decision trees with constant values in their leaves, model trees can use linear…

Machine Learning · Computer Science 2026-03-11 Sabino Francesco Roselli , Eibe Frank

Distributional Adaptive Soft Regression Trees

Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…

Methodology · Statistics 2022-10-20 Nikolaus Umlauf , Nadja Klein

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors explain…

Machine Learning · Statistics 2025-07-23 Brian Liu , Rahul Mazumder

Consistency of survival tree and forest models: splitting bias and correction

Random survival forest and survival trees are popular models in statistics and machine learning. However, there is a lack of general understanding regarding consistency, splitting rules and influence of the censoring mechanism. In this…

Statistics Theory · Mathematics 2019-02-05 Yifan Cui , Ruoqing Zhu , Mai Zhou , Michael Kosorok

Subsampling scaling: a theory about inference from partly observed systems

In real-world applications, observations are often constrained to a small fraction of a system. Such spatial subsampling can be caused by the inaccessibility or the sheer size of the system, and cannot be overcome by longer sampling.…

Data Analysis, Statistics and Probability · Physics 2017-06-02 Anna Levina , Viola Priesemann

Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection

When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization…

Machine Learning · Statistics 2023-10-06 Shijin Gong , Xinyu Zhang

Adaptive Concentration of Regression Trees, with Application to Random Forests

We study the convergence of the predictive surface of regression trees and forests. To support our analysis we introduce a notion of adaptive concentration for regression trees. This approach breaks tree training into a model selection…

Statistics Theory · Mathematics 2016-05-03 Stefan Wager , Guenther Walther

Linear Aggregation in Tree-based Estimators

Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study…

Methodology · Statistics 2021-09-13 Sören R. Künzel , Theo F. Saarinen , Edward W. Liu , Jasjeet S. Sekhon

Data Selection: A General Principle for Building Small Interpretable Models

We present convincing empirical evidence for an effective and general strategy for building accurate small models. Such models are attractive for interpretability and also find use in resource-constrained environments. The strategy is to…

Machine Learning · Computer Science 2024-04-30 Abhishek Ghose

Efficient non-greedy optimization of decision trees

Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy…

Machine Learning · Computer Science 2015-11-13 Mohammad Norouzi , Maxwell D. Collins , Matthew Johnson , David J. Fleet , Pushmeet Kohli

Improving the precision of classification trees

Besides serving as prediction models, classification trees are useful for finding important predictor variables and identifying interesting subgroups in the data. These functions can be compromised by weak split selection algorithms that…

Applications · Statistics 2010-11-03 Wei-Yin Loh

Multiple decision trees

This paper describes experiments, on two domains, to investigate the effect of averaging over predictions of multiple decision trees, instead of using a single tree. Other authors have pointed out theoretical and commonsense reasons for…

Machine Learning · Computer Science 2013-04-10 Suk Wah Kwok , Chris Carter

Subbagging Variable Selection for Big Data

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers

Despite their remarkable effectiveness and broad application, the drivers of success underlying ensembles of trees are still not fully understood. In this paper, we highlight how interpreting tree ensembles as adaptive and self-regularizing…

Machine Learning · Statistics 2024-02-05 Alicia Curth , Alan Jeffares , Mihaela van der Schaar

Evaluation of the relative performance of the subflattenings method for phylogenetic inference

The algebraic properties of flattenings and subflattenings provide direct methods for identifying edges in the true phylogeny -- and by extension the complete tree -- using pattern counts from a sequence alignment. The relatively small…

Populations and Evolution · Quantitative Biology 2022-05-06 Joshua Stevenson , Barbara Holland , Michael Charleston , Jeremy Sumner