Related papers: On the Convergence of CART under Sufficient Impuri…

Analyzing CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous…

Machine Learning · Statistics 2020-08-17 Jason M. Klusowski

Risk Bounds for CART Classifiers under a Margin Condition

Risk bounds for Classification and Regression Trees (CART, Breiman et. al. 1984) classifiers are obtained under a margin condition in the binary supervised classification framework. These risk bounds are obtained conditionally on the…

Machine Learning · Statistics 2012-06-27 Servane Gey

Propensity score estimation using classification and regression trees in the presence of missing covariate data

Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the…

Machine Learning · Statistics 2018-07-26 Bas B. L. Penning de Vries , Maarten van Smeden , Rolf H. H. Groenwold

Tree-Values: selective inference for regression trees

We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data…

Methodology · Statistics 2022-10-19 Anna C. Neufeld , Lucy L. Gao , Daniela M. Witten

Covariance-Driven Regression Trees: Reducing Overfitting in CART

Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown…

Machine Learning · Statistics 2026-01-13 Likun Zhang , Wei Ma

Sparse learning with CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to…

Machine Learning · Statistics 2020-11-20 Jason M. Klusowski

Risk Bounds for Embedded Variable Selection in Classification Trees

The problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the…

Statistics Theory · Mathematics 2012-06-27 Servane Gey , Tristan Mary-Huard

Uncertainty Quantification for Bayesian CART

This work affords new insights into Bayesian CART in the context of structured wavelet shrinkage. The main thrust is to develop a formal inferential framework for Bayesian tree-based regression. We reframe Bayesian CART as a g-type prior…

Statistics Theory · Mathematics 2021-05-25 Ismael Castillo , Veronika Rockova

TREE: Tree Regularization for Efficient Execution

The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time…

Machine Learning · Computer Science 2024-06-19 Lena Schmid , Daniel Biebert , Christian Hakert , Kuan-Hsun Chen , Michel Lang , Markus Pauly , Jian-Jia Chen

Optimal Convergence Rates of Deep Neural Networks in a Classification Setting

We establish optimal convergence rates up to a log-factor for a class of deep neural networks in a classification setting under a restraint sometimes referred to as the Tsybakov noise condition. We construct classifiers in a general setting…

Statistics Theory · Mathematics 2022-07-26 Joseph T. Meyer

On the Computational Efficiency of Bayesian Additive Regression Trees: An Asymptotic Analysis

Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression model that is commonly used in causal inference and beyond. Its strong predictive performance is supported by well-developed estimation theory,…

Machine Learning · Statistics 2026-02-10 Yan Shuo Tan , Omer Ronen , Theo Saarinen , Bin Yu

Error Bounds and Singularity Degree in Semidefinite Programming

In semidefinite programming a proposed optimal solution may be quite poor in spite of having sufficiently small residual in the optimality conditions. This issue may be framed in terms of the discrepancy between forward error (the…

Optimization and Control · Mathematics 2019-08-14 Stefan Sremac , Hugo J. Woerdeman , Henry Wolkowicz

The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation

Recursive decision trees are widely used to estimate heterogeneous causal treatment effects in experimental and observational studies. These methods are typically implemented using CART-type recursive partitioning and are often viewed as…

Statistics Theory · Mathematics 2026-03-19 Matias D. Cattaneo , Jason M. Klusowski , Ruiqi Rae Yu

A Bayesian Additive Regression Tree Model for Learning Conditional Average Treatment Effects in Regression Discontinuity Designs

This paper develops a performant Bayesian approach to conditional average treatment effect (CATE) estimation in regression discontinuity designs (RDD), an increasingly prevalent form of quasi-experiment that facilitates causal inference.…

Methodology · Statistics 2026-05-18 Rafael Alcantara , P. Richard Hahn , Hedibert F. Lopes

A Semi-supervised CART Model for Covariate Shift

Machine learning models used in medical applications often face challenges due to the covariate shift, which occurs when there are discrepancies between the distributions of training and target data. This can lead to decreased predictive…

Machine Learning · Computer Science 2024-12-24 Mingyang Cai , Thomas Klausch , Mark A. van de Wiel

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise,…

Machine Learning · Computer Science 2022-06-20 Guy Blanc , Jane Lange , Ali Malik , Li-Yang Tan

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

This paper studies inference in a randomized controlled trial (RCT) with covariate-adaptive randomization (CAR) and imperfect compliance of a binary treatment. In this context, we study inference on the LATE. As in Bugni et al. (2018,2019),…

Econometrics · Economics 2023-07-25 Federico A. Bugni , Mengsi Gao

Regularized impurity reduction: Accurate decision trees with complexity guarantees

Decision trees are popular classification models, providing high accuracy and intuitive explanations. However, as the tree size grows the model interpretability deteriorates. Traditional tree-induction algorithms, such as C4.5 and CART,…

Machine Learning · Computer Science 2022-11-29 Guangyi Zhang , Aristides Gionis

Consistency of Random Forest Type Algorithms under a Probabilistic Impurity Decrease Condition

This paper derives a unifying theorem establishing consistency results for a broad class of tree-based algorithms. It improves current results in two aspects. First of all, it can be applied to algorithms that vary from traditional Random…

Statistics Theory · Mathematics 2024-02-22 Ricardo Blum , Munir Hiabu , Enno Mammen , Joseph T. Meyer