Related papers: Risk Bounds for CART Classifiers under a Margin Co…
The problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the…
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous…
The decision tree is a flexible machine learning model that finds its success in numerous applications. It is usually fitted in a recursively greedy manner using CART. In this paper, we investigate the convergence rate of CART under a…
We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data…
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to…
Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown…
Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set…
This paper deals with variable selection in the regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This…
We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. A key novelty of our work is the derivation of…
In this paper we present a simple partitioning based technique to refine the statistical analysis of classification algorithms. The core of our idea is to divide the input space into two parts such that the first part contains a suitable…
We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classification framework. We extend Tsybakov's analysis of the risk of an ERM under margin type…
Decision tree and random forest classification and regression are some of the most widely used in machine learning approaches. Binary decision tree implementations commonly use conditioning in the form 'feature $\leq$ (or $<$) threshold',…
We consider the problem of adaptation to the margin in binary classification. We suggest a penalized empirical risk minimization classifier that adaptively attains, up to a logarithmic factor, fast optimal rates of convergence for the…
Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this…
We consider a problem of risk estimation for large-margin multi-class classifiers. We propose a novel risk bound for the multi-class classification problem. The bound involves the marginal distribution of the classifier and the Rademacher…
We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient…
We prove risk bounds for binary classification in high-dimensional settings when the sample size is allowed to be smaller than the dimensionality of the training set observations. In particular, we prove upper bounds for both 'compressive…
In this work, we study a new approach to optimizing the margin distribution realized by binary classifiers. The classical approach to this problem is simply maximization of the expected margin, while more recent proposals consider…
Classification is an important statistical learning tool. In real application, besides high prediction accuracy, it is often desirable to estimate class conditional probabilities for new observations. For traditional problems where the…