Related papers: Trees-Based Models for Correlated Data
This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there…
Additive models, such as produced by gradient boosting, and full interaction models, such as classification and regression trees (CART), are widely used algorithms that have been investigated largely in isolation. We show that these models…
We introduce tree linear cascades, a class of linear structural equation models for which the error variables are uncorrelated but need not be Gaussian nor independent. We show that, in spite of this weak assumption, the tree structure of…
We present Collaborative Trees, a novel tree model designed for regression prediction, along with its bagging version, which aims to analyze complex statistical associations between features and uncover potential patterns inherent in the…
The increasing complexity of data requires methods and models that can effectively handle intricate structures, as simplifying them would result in loss of information. While several analytical tools have been developed to work with complex…
Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees, have been successfully used for regression in many applications and research studies. Furthermore, these methods have been extended in order to deal with uncertainty…
Often machine learning methods are applied and results reported in cases where there is little to no information concerning accuracy of the output. Simply because a computer program returns a result does not insure its validity. If…
Analysis of sample survey data often requires adjustments to account for missing data in the outcome variables of principal interest. Standard adjustment methods based on item imputation or on propensity weighting factors rely heavily on…
Based on decision trees, many fields have arguably made tremendous progress in recent years. In simple words, decision trees use the strategy of "divide-and-conquer" to divide the complex problem on the dependency between input features and…
Battery performance datasets are typically non-normal and multicollinear. Extrapolating such datasets for model predictions needs attention to such characteristics. This study explores the impact of data normality in building machine…
Decision trees are important both as interpretable models amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well…
A compositional tree refers to a tree structure on a set of random variables where each random variable is a node and composition occurs at each non-leaf node of the tree. As a generalization of compositional data, compositional trees…
Generalized linear and additive models are very efficient regression tools but the selection of relevant terms becomes difficult if higher order interactions are needed. In contrast, tree-based methods also known as recursive partitioning…
This paper investigates the integration of gradient boosted decision trees and varying coefficient models. We introduce the tree boosted varying coefficient framework which justifies the implementation of decision tree boosting as the…
We introduce Hyper-Trees as a novel framework for modeling time series data using gradient boosted trees. Unlike conventional tree-based approaches that forecast time series directly, Hyper-Trees learn the parameters of a target time series…
Tree-structured models are a powerful alternative to parametric regression models if non-linear effects and interactions are present in the data. Yet, classical tree-structured models might not be appropriate if data comes in clusters of…
Dynamic regression trees are an attractive option for automatic regression and classification with complicated response surfaces in on-line application settings. We create a sequential tree model whose state changes in time with the…
Gradient boosted trees are competition-winning, general-purpose, non-parametric regressors, which exploit sequential model fitting and gradient descent to minimize a specific loss function. The most popular implementations are tailored to…
In this work, we propose a novel node splitting method for regression trees and incorporate it into the regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary…
Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of…