Related papers: Cross-Validated Variable Selection in Tree-Based M…

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE

A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search…

Methodology · Statistics 2019-06-26 Lisa Schlosser , Torsten Hothorn , Achim Zeileis

Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data

Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive…

Methodology · Statistics 2020-08-12 Måns Magnusson , Michael Riis Andersen , Johan Jonasson , Aki Vehtari

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Using leave-one-out cross-validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale

In recent decades, multilevel regression and poststratification (MRP) has surged in popularity for population inference. However, the validity of the estimates can depend on details of the model, and there is currently little research on…

Methodology · Statistics 2022-09-07 Swen Kuh , Lauren Kennedy , Qixuan Chen , Andrew Gelman

A New Flexible Train-Test Split Algorithm, an approach for choosing among the Hold-out, K-fold cross-validation, and Hold-out iteration

Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their…

Machine Learning · Computer Science 2026-01-05 Zahra Bami , Ali Behnampour , Aniruddha Bora , Hassan Doosti

Distributional bias compromises leave-one-out cross-validation

Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach…

Methodology · Statistics 2025-03-25 George I. Austin , Itsik Pe'er , Tal Korem

Cross-validating causal discovery via Leave-One-Variable-Out

We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a pair of variables that has been dropped when learning the causal model. To this end, we use the…

Machine Learning · Statistics 2024-11-11 Daniela Schkoda , Philipp Faller , Patrick Blöbaum , Dominik Janzing

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search…

Machine Learning · Computer Science 2015-06-26 Mohammad Norouzi , Maxwell D. Collins , David J. Fleet , Pushmeet Kohli

Strong Optimal Classification Trees

Decision trees are among the most popular machine learning models and are used routinely in applications ranging from revenue management and medicine to bioinformatics. In this paper, we consider the problem of learning optimal binary…

Machine Learning · Computer Science 2023-07-20 Sina Aghaei , Andrés Gómez , Phebe Vayanos

A scalable estimate of the extra-sample prediction error via approximate leave-one-out

The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as $K$-fold cross validation suffer from large biases. Motivated by the low bias of the leave-one-out cross…

Methodology · Statistics 2020-02-12 Kamiar Rahnama Rad , Arian Maleki

Bayesian leave-one-out cross-validation for large data

Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but…

Machine Learning · Statistics 2020-08-12 Måns Magnusson , Michael Riis Andersen , Johan Jonasson , Aki Vehtari

Robust leave-one-out cross-validation for high-dimensional Bayesian models

Leave-one-out cross-validation (LOO-CV) is a popular method for estimating out-of-sample predictive accuracy. However, computing LOO-CV criteria can be computationally expensive due to the need to fit the model multiple times. In the…

Computation · Statistics 2023-09-28 Luca Silva , Giacomo Zanella

Fast and robust cross-validation-based scoring rule inference for spatial statistics

Scoring rules are aimed at evaluation of the quality of predictions, but can also be used for estimation of parameters in statistical models. We propose estimating parameters of multivariate spatial models by maximising the average…

Methodology · Statistics 2024-08-23 Helga Kristin Olafsdottir , Holger Rootzén , David Bolin

Covariance-Driven Regression Trees: Reducing Overfitting in CART

Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown…

Machine Learning · Statistics 2026-01-13 Likun Zhang , Wei Ma

Unbiased split variable selection for random survival forests using maximally selected rank statistics

The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction…

Machine Learning · Statistics 2018-05-17 Marvin N. Wright , Theresa Dankowski , Andreas Ziegler

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the…

Computation · Statistics 2017-12-18 Aki Vehtari , Andrew Gelman , Jonah Gabry

Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models

Cross-validation can be used to measure a model's predictive accuracy for the purpose of model comparison, averaging, or selection. Standard leave-one-out cross-validation (LOO-CV) requires that the observation model can be factorized into…

Methodology · Statistics 2021-06-21 Paul-Christian Bürkner , Jonah Gabry , Aki Vehtari

ZTree: A Subgroup Identification Based Decision Tree Learning Framework

Decision trees are a commonly used class of machine learning models valued for their interpretability and versatility, capable of both classification and regression. We propose ZTree, a novel decision tree learning framework that replaces…

Machine Learning · Computer Science 2025-09-17 Eric Cheng , Jie Cheng

PLUTO: Penalized Unbiased Logistic Regression Trees

We propose a new algorithm called PLUTO for building logistic regression trees to binary response data. PLUTO can capture the nonlinear and interaction patterns in messy data by recursively partitioning the sample space. It fits a simple or…

Machine Learning · Statistics 2014-11-26 Wenwen Zhang , Wei-Yin Loh

Improving the precision of classification trees

Besides serving as prediction models, classification trees are useful for finding important predictor variables and identifying interesting subgroups in the data. These functions can be compromised by weak split selection algorithms that…

Applications · Statistics 2010-11-03 Wei-Yin Loh