Related papers: FastForest: Increasing Random Forest Processing Sp…

Large Random Forests: Optimisation for Rapid Evaluation

Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with…

Machine Learning · Computer Science 2019-12-24 Frederik Gossen , Bernhard Steffen

Binary Classification: Is Boosting stronger than Bagging?

Random Forests have been one of the most popular bagging methods in the past few decades, especially due to their success at handling tabular datasets. They have been extensively studied and compared to boosting models, like XGBoost, which…

Machine Learning · Computer Science 2024-10-28 Dimitris Bertsimas , Vasiliki Stoumpou

MABSplit: Faster Forest Training Using Multi-Armed Bandits

Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based…

Machine Learning · Computer Science 2022-12-16 Mo Tiwari , Ryan Kang , Je-Yong Lee , Sebastian Thrun , Chris Piech , Ilan Shomorony , Martin Jinye Zhang

An Efficient and Robust System for Vertically Federated Random Forest

As there is a growing interest in utilizing data across multiple resources to build better machine learning models, many vertically federated learning algorithms have been proposed to preserve the data privacy of the participating…

Machine Learning · Computer Science 2022-01-27 Houpu Yao , Jiazhou Wang , Peng Dai , Liefeng Bo , Yanqing Chen

Random Similarity Forests

The wealth of data being gathered about humans and their surroundings drives new machine learning applications in various fields. Consequently, more and more often, classifiers are trained using not only numerical data but also complex data…

Machine Learning · Computer Science 2022-04-13 Maciej Piernik , Dariusz Brzezinski , Pawel Zawadzki

Random Forest Missing Data Algorithms

Random forest (RF) missing data algorithms are an attractive approach for dealing with missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity,…

Machine Learning · Statistics 2017-01-23 Fei Tang , Hemant Ishwaran

Improved Weighted Random Forest for Classification Problems

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of…

Machine Learning · Computer Science 2021-03-01 Mohsen Shahhosseini , Guiping Hu

Improving Random Forests by Smoothing

Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of…

Machine Learning · Computer Science 2026-05-19 Ziyi Liu , Phuc Luong , Mario Boley , Daniel F. Schmidt

Boosting SISSO Performance on Small Sample Datasets by Using Random Forests Prescreening for Complex Feature Selection

In materials science, data-driven methods accelerate material discovery and optimization while reducing costs and improving success rates. Symbolic regression is a key to extracting material descriptors from large datasets, in particular…

Machine Learning · Computer Science 2024-10-01 Xiaolin Jiang , Guanqi Liu , Jiaying Xie , Zhenpeng Hu

Random Forests Can Hash

Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first…

Computer Vision and Pattern Recognition · Computer Science 2015-04-20 Qiang Qiu , Guillermo Sapiro , Alex Bronstein

Unsupervised Decision Forest for Data Clustering and Density Estimation

An algorithm to improve performance parameter for unsupervised decision forest clustering and density estimation is presented. Specifically, a dual assignment parameter is introduced as a density estimator by combining Random Forest and…

Computer Vision and Pattern Recognition · Computer Science 2015-07-19 Hayder Albehadili , Naz Islam

Consistency of random forests

Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical…

Statistics Theory · Mathematics 2015-08-11 Erwan Scornet , Gérard Biau , Jean-Philippe Vert

Crossbreeding in Random Forest

Ensemble learning methods are designed to benefit from multiple learning algorithms for better predictive performance. The tradeoff of this improved performance is slower speed and larger size of ensemble learning systems compared to single…

Machine Learning · Computer Science 2021-01-22 Abolfazl Nadi , Hadi Moradi , Khalil Taheri

Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance

As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications. Tools like…

Machine Learning · Statistics 2020-11-10 Lucas Mentch , Siyu Zhou

Deep Dynamic Boosted Forest

Random forest is widely exploited as an ensemble learning method. In many practical applications, however, there is still a significant challenge to learn from imbalanced data. To alleviate this limitation, we propose a deep dynamic boosted…

Machine Learning · Computer Science 2022-03-08 Haixin Wang , Xingzhang Ren , Jinan Sun , Wei Ye , Long Chen , Muzhi Yu , Shikun Zhang

hi-RF: Incremental Learning Random Forest for large-scale multi-class Data Classification

In recent years, dynamically growing data and incrementally growing number of classes pose new challenges to large-scale data classification research. Most traditional methods struggle to balance the precision and computational burden when…

Machine Learning · Computer Science 2016-11-01 Tingting Xie , Yuxing Peng , Changjian Wang

Performance Analysis and Comparison of Machine and Deep Learning Algorithms for IoT Data Classification

In recent years, the growth of Internet of Things (IoT) as an emerging technology has been unbelievable. The number of networkenabled devices in IoT domains is increasing dramatically, leading to the massive production of electronic data.…

Machine Learning · Computer Science 2020-01-29 Meysam Vakili , Mohammad Ghamsari , Masoumeh Rezaei

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature…

Machine Learning · Computer Science 2022-01-19 Xiaojun Mao , Liuhua Peng , Zhonglei Wang

Extremely Simple Streaming Forest

Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in…

Machine Learning · Computer Science 2025-06-27 Haoyin Xu , Jayanta Dey , Sambit Panda , Joshua T. Vogelstein

Forest Packing: Fast, Parallel Decision Forests

Machine learning has an emerging critical role in high-performance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool…

Performance · Computer Science 2018-06-22 James Browne , Tyler M. Tomita , Disa Mhembere , Randal Burns , Joshua T. Vogelstein