Related papers: Risk Bounds for CART Classifiers under a Margin Co…

Risk Bounds for Embedded Variable Selection in Classification Trees

The problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the…

Statistics Theory · Mathematics 2012-06-27 Servane Gey , Tristan Mary-Huard

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Analyzing CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous…

Machine Learning · Statistics 2020-08-17 Jason M. Klusowski

On the Convergence of CART under Sufficient Impurity Decrease Condition

The decision tree is a flexible machine learning model that finds its success in numerous applications. It is usually fitted in a recursively greedy manner using CART. In this paper, we investigate the convergence rate of CART under a…

Machine Learning · Statistics 2023-10-27 Rahul Mazumder , Haoyue Wang

Tree-Values: selective inference for regression trees

We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data…

Methodology · Statistics 2022-10-19 Anna C. Neufeld , Lucy L. Gao , Daniela M. Witten

Sparse learning with CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to…

Machine Learning · Statistics 2020-11-20 Jason M. Klusowski

Covariance-Driven Regression Trees: Reducing Overfitting in CART

Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown…

Machine Learning · Statistics 2026-01-13 Likun Zhang , Wei Ma

Classification algorithms using adaptive partitioning

Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set…

Statistics Theory · Mathematics 2014-11-05 Peter Binev , Albert Cohen , Wolfgang Dahmen , Ronald DeVore

Variable selection through CART

This paper deals with variable selection in the regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This…

Statistics Theory · Mathematics 2011-01-05 Marie Sauvé , Christine Tuleau-Malot

Minimax learning rates for estimating binary classifiers under margin conditions

We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. A key novelty of our work is the derivation of…

Machine Learning · Statistics 2026-03-16 Jonathan García , Philipp Petersen

Improved Classification Rates under Refined Margin Conditions

In this paper we present a simple partitioning based technique to refine the statistical analysis of classification algorithms. The core of our idea is to divide the input space into two parts such that the first part contains a suitable…

Statistics Theory · Mathematics 2018-03-06 Ingrid Blaschzyk , Ingo Steinwart

Risk bounds for statistical learning

We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classification framework. We extend Tsybakov's analysis of the risk of an ERM under margin type…

Statistics Theory · Mathematics 2016-08-14 Pascal Massart , Élodie Nédélec

The Conditioning Bias in Binary Decision Trees and Random Forests and Its Elimination

Decision tree and random forest classification and regression are some of the most widely used in machine learning approaches. Binary decision tree implementations commonly use conditioning in the form 'feature $\leq$ (or $<$) threshold',…

Machine Learning · Computer Science 2023-12-19 Gábor Timár , György Kovács

Square Root Penalty: Adaptation to the Margin in Classification and in Edge Estimation

We consider the problem of adaptation to the margin in binary classification. We suggest a penalized empirical risk minimization classifier that adaptively attains, up to a logarithmic factor, fast optimal rates of convergence for the…

Statistics Theory · Mathematics 2007-06-13 A. B. Tsybakov , S. A. van de Geer

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this…

Machine Learning · Computer Science 2022-01-04 Yuan Cao , Quanquan Gu , Mikhail Belkin

Tight Risk Bounds for Multi-Class Margin Classifiers

We consider a problem of risk estimation for large-margin multi-class classifiers. We propose a novel risk bound for the multi-class classification problem. The bound involves the marginal distribution of the classifier and the Rademacher…

Machine Learning · Statistics 2021-09-15 Yury Maximov , Daria Reshetova

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient…

Machine Learning · Computer Science 2020-12-18 Andrés R. Masegosa , Stephan S. Lorenzen , Christian Igel , Yevgeny Seldin

Structure-aware error bounds for linear classification with the zero-one loss

We prove risk bounds for binary classification in high-dimensional settings when the sample size is allowed to be smaller than the dimensionality of the training set observations. In particular, we prove upper bounds for both 'compressive…

Statistics Theory · Mathematics 2017-09-29 Ata Kaban , Robert J. Durrant

Classification using margin pursuit

In this work, we study a new approach to optimizing the margin distribution realized by binary classifiers. The classical approach to this problem is simply maximization of the expected margin, while more recent proposals consider…

Machine Learning · Statistics 2018-10-12 Matthew J. Holland

Multi-category Angle-based Classifier Refit

Classification is an important statistical learning tool. In real application, besides high prediction accuracy, it is often desirable to estimate class conditional probabilities for new observations. For traditional problems where the…

Statistics Theory · Mathematics 2025-03-18 Guo Xian Yau , Chong Zhang