Related papers: FastForest: Increasing Random Forest Processing Sp…
Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with…
Random Forests have been one of the most popular bagging methods in the past few decades, especially due to their success at handling tabular datasets. They have been extensively studied and compared to boosting models, like XGBoost, which…
Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based…
As there is a growing interest in utilizing data across multiple resources to build better machine learning models, many vertically federated learning algorithms have been proposed to preserve the data privacy of the participating…
The wealth of data being gathered about humans and their surroundings drives new machine learning applications in various fields. Consequently, more and more often, classifiers are trained using not only numerical data but also complex data…
Random forest (RF) missing data algorithms are an attractive approach for dealing with missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity,…
Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of…
Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of…
In materials science, data-driven methods accelerate material discovery and optimization while reducing costs and improving success rates. Symbolic regression is a key to extracting material descriptors from large datasets, in particular…
Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first…
An algorithm to improve performance parameter for unsupervised decision forest clustering and density estimation is presented. Specifically, a dual assignment parameter is introduced as a density estimator by combining Random Forest and…
Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical…
Ensemble learning methods are designed to benefit from multiple learning algorithms for better predictive performance. The tradeoff of this improved performance is slower speed and larger size of ensemble learning systems compared to single…
As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications. Tools like…
Random forest is widely exploited as an ensemble learning method. In many practical applications, however, there is still a significant challenge to learn from imbalanced data. To alleviate this limitation, we propose a deep dynamic boosted…
In recent years, dynamically growing data and incrementally growing number of classes pose new challenges to large-scale data classification research. Most traditional methods struggle to balance the precision and computational burden when…
In recent years, the growth of Internet of Things (IoT) as an emerging technology has been unbelievable. The number of networkenabled devices in IoT domains is increasing dramatically, leading to the massive production of electronic data.…
Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature…
Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in…
Machine learning has an emerging critical role in high-performance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool…