Related papers: Interpretable random forest models through forward…
We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components…
Random Forest has become one of the most popular tools for feature selection. Its ability to deal with high-dimensional data makes this algorithm especially useful for studies in neuroimaging and bioinformatics. Despite its popularity and…
Random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of…
Distributional regression aims at estimating the conditional distribution of a targetvariable given explanatory co-variates. It is a crucial tool for forecasting whena precise uncertainty quantification is required. A popular methodology…
Random forests is a state-of-the-art supervised machine learning method which behaves well in high-dimensional settings although some limitations may happen when $p$, the number of predictors, is much larger than the number of observations…
Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…
Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…
Random forests is a common non-parametric regression technique which performs well for mixed-type unordered data and irrelevant features, while being robust to monotonic variable transformations. Standard random forests, however, do not…
Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…
Machine learning algorithms often assume that training samples are independent. When data points are connected by a network, the induced dependency between samples is both a challenge, reducing effective sample size, and an opportunity to…
A random forest prediction can be computed by the scalar product of the labels of the training examples and a set of weights that are determined by the leafs of the forest into which the test object falls; each prediction can hence be…
Multi-target regression is useful in a plethora of applications. Although random forest models perform well in these tasks, they are often difficult to interpret. Interpretability is crucial in machine learning, especially when it can…
We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence. The leaf-wise predictions for each decision tree making up clustered random forests…
The Distributional Random Forest (DRF) is a recently introduced Random Forest algorithm to estimate multivariate conditional distributions. Due to its general estimation procedure, it can be employed to estimate a wide range of targets such…
Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree,…
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction…
Random forest regression (RF) is an extremely popular tool for the analysis of high-dimensional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting)…
A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a…
In surveys, the interest lies in estimating finite population parameters such as population totals and means. In most surveys, some auxiliary information is available at the estimation stage. This information may be incorporated in the…
Most scientific publications follow the familiar recipe of (i) obtain data, (ii) fit a model, and (iii) comment on the scientific relevance of the effects of particular covariates in that model. This approach, however, ignores the fact that…