Related papers: S-SIRUS: an explainability algorithm for spatial r…
We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm which takes the form of a short and simple list of rules. State-of-the-art learning algorithms are often referred to as "black boxes"…
State-of-the-art learning algorithms, such as random forests or neural networks, are often qualified as "black-boxes" because of the high number and complexity of operations involved in their prediction mechanism. This lack of…
Random Forest (RF) is a well-known data-driven algorithm applied in several fields thanks to its flexibility in modeling the relationship between the response variable and the predictors, also in case of strong non-linearities. In…
Random forest (RF) is one of the most popular methods for estimating regression functions. The local nature of the RF algorithm, based on intra-node means and variances, is ideal when errors are i.i.d. For dependent error processes like…
In materials science, data-driven methods accelerate material discovery and optimization while reducing costs and improving success rates. Symbolic regression is a key to extracting material descriptors from large datasets, in particular…
Random forest (RF) methodology is one of the most popular machine learning techniques for prediction problems. In this article, we discuss some cases where random forests may suffer and propose a novel generalized RF method, namely…
Random Forests (RF) is a popular machine learning method for classification and regression problems. It involves a bagging application to decision tree models. One of the primary advantages of the Random Forests model is the reduction in…
Random Forest (RF) is a powerful ensemble method for classification and regression tasks. It consists of decision trees set. Although, a single tree is well interpretable for human, the ensemble of trees is a black-box model. The popular…
We initiate a novel approach to explain the predictions and out of sample performance of random forest (RF) regression and classification models by exploiting the fact that any RF can be mathematically formulated as an adaptive weighted K…
Random forest (RF) missing data algorithms are an attractive approach for dealing with missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity,…
Random Forests [Breiman:2001] (RF) are a fully non-parametric statistical method requiring no distributional assumptions on covariate relation to the response. RF are a robust, nonlinear technique that optimizes predictive accuracy by…
Sparse residual tree (SRT) is an adaptive exploration method for multivariate scattered data approximation. It leads to sparse and stable approximations in areas where the data is sufficient or redundant, and points out the possible local…
An increasing array of biomedical and computer vision applications requires the predictive modeling of complex data, for example images and shapes. The main challenge when predicting such objects lies in the fact that they do not comply to…
Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of…
Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and…
It is widely recognised that semiparametric efficient estimation can be hard to achieve in practice: estimators that are in theory efficient may require unattainable levels of accuracy for the estimation of complex nuisance functions. As a…
Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…
Predicting rare outcomes such as startup success is central to venture capital, demanding models that are both accurate and interpretable. We introduce Random Rule Forest (RRF), a lightweight ensemble method that uses a large language model…
Machine learning (ML) algorithms become increasingly important in the analysis of astronomical data. However, since most ML algorithms are not designed to take data uncertainties into account, ML based studies are mostly restricted to data…
Spatially Coherent Random Forest (SCRF) extends Random Forest to create spatially coherent labeling. Each split function in SCRF is evaluated based on a traditional information gain measure that is regularized by a spatial coherency term.…