Related papers: Tree-Values: selective inference for regression tr…

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Classification Trees with Valid Inference via the Exponential Mechanism

Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely…

Methodology · Statistics 2026-04-14 Soham Bakshi , Snigdha Panigrahi

Measure Inducing Classification and Regression Trees for Functional Data

We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis, which allows to leverage representation learning and multiple splitting rules at the node level, reducing…

Machine Learning · Statistics 2020-11-03 Edoardo Belli , Simone Vantini

Sparse learning with CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to…

Machine Learning · Statistics 2020-11-20 Jason M. Klusowski

Inference with Randomized Regression Trees

Regression trees are a popular machine learning algorithm that fit piecewise constant models by recursively partitioning the predictor space. This paper focuses on statistical inference for a data-dependent model obtained from a fitted…

Methodology · Statistics 2025-12-17 Soham Bakshi , Yiling Huang , Snigdha Panigrahi , Walter Dempsey

Covariance-Driven Regression Trees: Reducing Overfitting in CART

Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown…

Machine Learning · Statistics 2026-01-13 Likun Zhang , Wei Ma

Regularisation of CART trees by summation of $p$-values

The standard procedure to decide on the complexity of a CART regression tree is to use cross-validation with the aim of obtaining a predictor that generalises well to unseen data. The randomness in the selection of folds implies that the…

Methodology · Statistics 2025-10-29 Nils Engler , Mathias Lindholm , Filip Lindskog , Taariq Nazar

Propensity score estimation using classification and regression trees in the presence of missing covariate data

Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the…

Machine Learning · Statistics 2018-07-26 Bas B. L. Penning de Vries , Maarten van Smeden , Rolf H. H. Groenwold

Analyzing CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous…

Machine Learning · Statistics 2020-08-17 Jason M. Klusowski

A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

Decision trees are important both as interpretable models amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well…

Machine Learning · Statistics 2021-10-20 Yan Shuo Tan , Abhineet Agarwal , Bin Yu

Dive into Decision Trees and Forests: A Theoretical Demonstration

Based on decision trees, many fields have arguably made tremendous progress in recent years. In simple words, decision trees use the strategy of "divide-and-conquer" to divide the complex problem on the dependency between input features and…

Machine Learning · Computer Science 2021-01-22 Jinxiong Zhang

The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation

Recursive decision trees are widely used to estimate heterogeneous causal treatment effects in experimental and observational studies. These methods are typically implemented using CART-type recursive partitioning and are often viewed as…

Statistics Theory · Mathematics 2026-03-19 Matias D. Cattaneo , Jason M. Klusowski , Ruiqi Rae Yu

Consensus Tree Estimation with False Discovery Rate Control via Partially Ordered Sets

Connected acyclic graphs (trees) are data objects that hierarchically organize categories. Collections of trees arise in a diverse variety of fields, including evolutionary biology, public health, machine learning, social sciences and…

Methodology · Statistics 2025-12-01 Maria Alejandra Valdez Cabrera , Amy D Willis , Armeen Taeb

Risk Bounds for Embedded Variable Selection in Classification Trees

The problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the…

Statistics Theory · Mathematics 2012-06-27 Servane Gey , Tristan Mary-Huard

Selective Inference for Testing Trees and Edges in Phylogenetics

Selective inference is considered for testing trees and edges in phylogenetic tree selection from molecular sequences. This improves the previously proposed approximately unbiased test by adjusting the selection bias when testing many trees…

Applications · Statistics 2019-05-27 Hidetoshi Shimodaira , Yoshikazu Terada

Big Data Regression Using Tree Based Segmentation

Scaling regression to large datasets is a common problem in many application areas. We propose a two step approach to scaling regression to large datasets. Using a regression tree (CART) to segment the large dataset constitutes the first…

Machine Learning · Statistics 2017-07-26 Rajiv Sambasivan , Sourish Das

Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification

Random forests construct each tree with a different, randomised representation of the feature space. Their uniform voting cannot correct errors in regions where trees with incorrect representations probabilistically outnumber correct ones,…

Machine Learning · Computer Science 2026-05-28 Youngjoon Park

Large Scale Prediction with Decision Trees

This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows…

Machine Learning · Statistics 2023-11-15 Jason M. Klusowski , Peter M. Tian

ZTree: A Subgroup Identification Based Decision Tree Learning Framework

Decision trees are a commonly used class of machine learning models valued for their interpretability and versatility, capable of both classification and regression. We propose ZTree, a novel decision tree learning framework that replaces…

Machine Learning · Computer Science 2025-09-17 Eric Cheng , Jie Cheng

A Semi-supervised CART Model for Covariate Shift

Machine learning models used in medical applications often face challenges due to the covariate shift, which occurs when there are discrepancies between the distributions of training and target data. This can lead to decreased predictive…

Machine Learning · Computer Science 2024-12-24 Mingyang Cai , Thomas Klausch , Mark A. van de Wiel