Related papers: Risk Bounds for Embedded Variable Selection in Cla…

Risk Bounds for CART Classifiers under a Margin Condition

Risk bounds for Classification and Regression Trees (CART, Breiman et. al. 1984) classifiers are obtained under a margin condition in the binary supervised classification framework. These risk bounds are obtained conditionally on the…

Machine Learning · Statistics 2012-06-27 Servane Gey

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Penalized Split Criteria for Interpretable Trees

This paper describes techniques for growing classification and regression trees designed to induce visually interpretable trees. This is achieved by penalizing splits that extend the subset of features used in a particular branch of the…

Methodology · Statistics 2013-10-22 Alex Goldstein , Andreas Buja

Variable selection through CART

This paper deals with variable selection in the regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This…

Statistics Theory · Mathematics 2011-01-05 Marie Sauvé , Christine Tuleau-Malot

Tree-Values: selective inference for regression trees

We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data…

Methodology · Statistics 2022-10-19 Anna C. Neufeld , Lucy L. Gao , Daniela M. Witten

Penalized Variable Selection with Broken Adaptive Ridge Regression for Semi-competing Risks Data

Semi-competing risks data arise when both non-terminal and terminal events are considered in a model. Such data with multiple events of interest are frequently encountered in medical research and clinical trials. In this framework, terminal…

Methodology · Statistics 2022-11-21 Fatemeh Mahmoudi , Xuewen Lu

Analyzing CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous…

Machine Learning · Statistics 2020-08-17 Jason M. Klusowski

Covariance-Driven Regression Trees: Reducing Overfitting in CART

Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown…

Machine Learning · Statistics 2026-01-13 Likun Zhang , Wei Ma

TREE: Tree Regularization for Efficient Execution

The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time…

Machine Learning · Computer Science 2024-06-19 Lena Schmid , Daniel Biebert , Christian Hakert , Kuan-Hsun Chen , Michel Lang , Markus Pauly , Jian-Jia Chen

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

Forest Garrote

Variable selection for high-dimensional linear models has received a lot of attention lately, mostly in the context of l1-regularization. Part of the attraction is the variable selection effect: parsimonious models are obtained, which are…

Machine Learning · Statistics 2009-06-22 Nicolai Meinshausen

Sparse learning with CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to…

Machine Learning · Statistics 2020-11-20 Jason M. Klusowski

Adaptive bridge regression modeling with model selection criteria

We consider the problem of constructing an adaptive bridge regression modeling, which is a penalized procedure by imposing different weights to different coefficients in the bridge penalty term. A crucial issue in the modeling process is…

Methodology · Statistics 2013-02-15 Shuichi Kawano

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Model selection is a crucial issue in machine-learning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is…

Machine Learning · Statistics 2012-12-11 Charanpal Dhanjal , Nicolas Baskiotis , Stéphan Clémençon , Nicolas Usunier

Selection by Partitioning the Solution Paths

The performance of penalized likelihood approaches depends profoundly on the selection of the tuning parameter; however, there is no commonly agreed-upon criterion for choosing the tuning parameter. Moreover, penalized likelihood estimation…

Methodology · Statistics 2018-05-09 Yang Liu , Peng Wang

The Conditioning Bias in Binary Decision Trees and Random Forests and Its Elimination

Decision tree and random forest classification and regression are some of the most widely used in machine learning approaches. Binary decision tree implementations commonly use conditioning in the form 'feature $\leq$ (or $<$) threshold',…

Machine Learning · Computer Science 2023-12-19 Gábor Timár , György Kovács

Some upper bounds for the rate of convergence of penalized likelihood context tree estimators

We find upper bounds for the probability of underestimation and overestimation errors in penalized likelihood context tree estimation. The bounds are explicit and applies to processes of not necessarily finite memory. We allow for general…

Statistics Theory · Mathematics 2009-03-11 Florencia Leonardi

Improving Decision Trees through the Lens of Parameterized Local Search

Algorithms for learning decision trees often include heuristic local-search operations such as (1) adjusting the threshold of a cut or (2) also exchanging the feature of that cut. We study minimizing the number of classification errors by…

Machine Learning · Computer Science 2025-10-15 Juha Harviainen , Frank Sommer , Manuel Sorge

The asymptotic effect of tuning parameters

Tuning parameters are parameters involved in an estimating procedure for the purpose of reducing the risk of some other estimator. Examples include the degree of penalization in penalized regression and likelihood problems, as well as the…

Statistics Theory · Mathematics 2026-03-31 Ingrid Dæhlen , Nils Lid Hjort , Ingrid Hobæk Haff

Leveraging Predictive Equivalence in Decision Trees

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be…

Machine Learning · Computer Science 2025-10-15 Hayden McTavish , Zachery Boner , Jon Donnelly , Margo Seltzer , Cynthia Rudin