Improving Random Forests by Smoothing
Abstract
Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of random forest predictions means each partition is predicted independently, ignoring potential smoothness in the underlying function. Particularly in the small data regime, this lack of information sharing across the input space can lead to suboptimal performance. In this work, we propose a kernel-based smoothing mechanism that enhances random forests by introducing local regularity to their predictions while preserving their adaptive partitioning capabilities. Our approach applies kernel smoothing to the piecewise constant outputs of random forests, effectively combining the adaptability of tree-based methods with the smoothness assumptions of kernel methods. We show that this smoothing procedure can be interpreted as capturing the variability/uncertainty in the tree cut points under resampling of the training inputs. Empirical results demonstrate that the proposed smoothed random forest model consistently improves predictive performance across diverse test cases, particularly in data-scarce settings. Code, datasets, and experiment results are publicly available at https://github.com/Neal-Liu-Ziyi/SmoothedRandomForest.git.
Keywords
Cite
@article{arxiv.2505.06852,
title = {Improving Random Forests by Smoothing},
author = {Ziyi Liu and Phuc Luong and Mario Boley and Daniel F. Schmidt},
journal= {arXiv preprint arXiv:2505.06852},
year = {2026}
}
Comments
v2: Accepted manuscript. 30 pages (18 main + 12 appendix), 6 figures