Related papers: Nonparametric Variable Screening with Optimal Deci…

Optimal Sparse Recovery with Decision Stumps

Decision trees are widely used for their low computational cost, good predictive performance, and ability to assess the importance of features. Though often used in practice for feature selection, the theoretical guarantees of these methods…

Machine Learning · Statistics 2023-03-09 Kiarash Banihashem , MohammadTaghi Hajiaghayi , Max Springer

Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

Tree-based methods are powerful nonparametric techniques in statistics and machine learning. However, their effectiveness, particularly in finite-sample settings, is not fully understood. Recent applications have revealed their surprising…

Statistics Theory · Mathematics 2024-10-04 Hengrui Luo , Meng Li

Variable Selection with ABC Bayesian Forests

Few problems in statistics are as perplexing as variable selection in the presence of very many redundant covariates. The variable selection problem is most familiar in parametric environments such as the linear model or additive variants…

Methodology · Statistics 2021-02-25 Yi Liu , Veronika Ročková , Yuexi Wang

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Improving the precision of classification trees

Besides serving as prediction models, classification trees are useful for finding important predictor variables and identifying interesting subgroups in the data. These functions can be compromised by weak split selection algorithms that…

Applications · Statistics 2010-11-03 Wei-Yin Loh

Tree-Structured Boosting: Connections Between Gradient Boosted Stumps and Full Decision Trees

Additive models, such as produced by gradient boosting, and full interaction models, such as classification and regression trees (CART), are widely used algorithms that have been investigated largely in isolation. We show that these models…

Machine Learning · Statistics 2017-11-21 José Marcio Luna , Eric Eaton , Lyle H. Ungar , Eric Diffenderfer , Shane T. Jensen , Efstathios D. Gennatas , Mateo Wirth , Charles B. Simone , Timothy D. Solberg , Gilmer Valdes

Screening Methods for Classification Based on Non-parametric Bayesian Tests

Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly…

Methodology · Statistics 2023-01-09 Naveed Merchant , Jeffrey D. Hart

Classification Trees with Valid Inference via the Exponential Mechanism

Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely…

Methodology · Statistics 2026-04-14 Soham Bakshi , Snigdha Panigrahi

Analyzing CART

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous…

Machine Learning · Statistics 2020-08-17 Jason M. Klusowski

Variable Selection Using Bayesian Additive Regression Trees

Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive…

Methodology · Statistics 2021-12-30 Chuji Luo , Michael J. Daniels

Conditional nonparametric variable screening by neural factor regression

High-dimensional covariates often admit linear factor structure. To effectively screen correlated covariates in high-dimension, we propose a conditional variable screening test based on non-parametric regression using neural networks due to…

Econometrics · Economics 2024-08-21 Jianqing Fan , Weining Wang , Yue Zhao

(De-)Randomized Smoothing for Decision Stump Ensembles

Tree-based models are used in many high-stakes application domains such as finance and medicine, where robustness and interpretability are of utmost importance. Yet, methods for improving and certifying their robustness are severely…

Machine Learning · Computer Science 2022-11-16 Miklós Z. Horváth , Mark Niklas Müller , Marc Fischer , Martin Vechev

Variational Nonparametric Discriminant Analysis

Variable selection and classification are common objectives in the analysis of high-dimensional data. Most such methods make distributional assumptions that may not be compatible with the diverse families of distributions data can take. A…

Methodology · Statistics 2019-08-28 Weichang Yu , Lamiae Azizi , John T. Ormerod

Non-uniform Feature Sampling for Decision Tree Ensembles

We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$…

Machine Learning · Statistics 2014-03-25 Anastasios Kyrillidis , Anastasios Zouzias

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Determination of class-specific variables in nonparametric multiple-class classification

As technology advanced, collecting data via automatic collection devices become popular, thus we commonly face data sets with lengthy variables, especially when these data sets are collected without specific research goals beforehand. It…

Machine Learning · Statistics 2022-05-10 Wan-Ping Nicole Chen , Yuan-chin Ivan Chang

Asymptotic Properties of High-Dimensional Random Forests

As a flexible nonparametric learning tool, the random forests algorithm has been widely applied to various real applications with appealing empirical performance, even in the presence of high-dimensional feature space. Unveiling the…

Statistics Theory · Mathematics 2022-09-27 Chien-Ming Chi , Patrick Vossler , Yingying Fan , Jinchi Lv

Variable selection for BART: An application to gene regulation

We consider the task of discovering gene regulatory networks, which are defined as sets of genes and the corresponding transcription factors which regulate their expression levels. This can be viewed as a variable selection problem,…

Methodology · Statistics 2014-12-04 Justin Bleich , Adam Kapelner , Edward I. George , Shane T. Jensen

Tree-Values: selective inference for regression trees

We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data…

Methodology · Statistics 2022-10-19 Anna C. Neufeld , Lucy L. Gao , Daniela M. Witten

On Variable Screening in Multiple Nonparametric Regression Model

In this article, we study the problem of variable screening in multiple nonparametric regression model. The proposed methodology is based on the fact that the partial derivative of the regression function with respect to the irrelevant…

Methodology · Statistics 2021-01-19 Subhra Sankar Dhar , Prashant Jha , Aranyak Acharyya