Related papers: Big Data Regression Using Tree Based Segmentation
We present an algorithm for classification tasks on big data. Experiments conducted as part of this study indicate that the algorithm can be as accurate as ensemble methods such as random forests or gradient boosted trees. Unlike ensemble…
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…
The gradient boosting machine is a powerful ensemble-based machine learning method for solving regression problems. However, one of the difficulties of its using is a possible discontinuity of the regression function, which arises when…
We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis, which allows to leverage representation learning and multiple splitting rules at the node level, reducing…
Additive models, such as produced by gradient boosting, and full interaction models, such as classification and regression trees (CART), are widely used algorithms that have been investigated largely in isolation. We show that these models…
Regression trees are a popular machine learning algorithm that fit piecewise constant models by recursively partitioning the predictor space. This paper focuses on statistical inference for a data-dependent model obtained from a fitted…
Based on decision trees, many fields have arguably made tremendous progress in recent years. In simple words, decision trees use the strategy of "divide-and-conquer" to divide the complex problem on the dependency between input features and…
This paper presents a computationally efficient variant of gradient boosting for multi-class classification and multi-output regression tasks. Standard gradient boosting uses a 1-vs-all strategy for classifications tasks with more than two…
In many applications of supervised learning, multiple classification or regression outputs have to be predicted jointly. We consider several extensions of gradient boosting to address such problems. We first propose a straightforward…
We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). "Best-scored" means to select one regression tree with the best empirical performance out of a certain number of…
Preprocessing data is an important step before any data analysis. In this paper, we focus on one particular aspect, namely scaling or normalization. We analyze various scaling methods in common use and study their effects on different…
Ren et al. recently introduced a method for aggregating multiple decision trees into a strong predictor by interpreting a path taken by a sample down each tree as a binary vector and performing linear regression on top of these vectors…
Regression trees have emerged as a preeminent tool for solving real-world regression problems due to their ability to deal with nonlinearities, interaction effects and sharp discontinuities. In this article, we rather study regression trees…
Previous algorithms for constructing regression tree models for longitudinal and multiresponse data have mostly followed the CART approach. Consequently, they inherit the same selection biases and computational difficulties as CART. We…
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to…
Using ensemble methods for regression has been a large success in obtaining high-accuracy prediction. Examples are Bagging, Random forest, Boosting, BART (Bayesian additive regression tree), and their variants. In this paper, we propose a…
The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the…
Besides serving as prediction models, classification trees are useful for finding important predictor variables and identifying interesting subgroups in the data. These functions can be compromised by weak split selection algorithms that…
Gaussian Processes are widely used for regression tasks. A known limitation in the application of Gaussian Processes to regression tasks is that the computation of the solution requires performing a matrix inversion. The solution also…
Gradient boosting algorithms construct a regression predictor using a linear combination of ``base learners''. Boosting also offers an approach to obtaining robust non-parametric regression estimators that are scalable to applications with…