English
Related papers

Related papers: Cross-Validated Variable Selection in Tree-Based M…

200 papers

A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search…

Methodology · Statistics 2019-06-26 Lisa Schlosser , Torsten Hothorn , Achim Zeileis

Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive…

Methodology · Statistics 2020-08-12 Måns Magnusson , Michael Riis Andersen , Johan Jonasson , Aki Vehtari

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

In recent decades, multilevel regression and poststratification (MRP) has surged in popularity for population inference. However, the validity of the estimates can depend on details of the model, and there is currently little research on…

Methodology · Statistics 2022-09-07 Swen Kuh , Lauren Kennedy , Qixuan Chen , Andrew Gelman

Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their…

Machine Learning · Computer Science 2026-01-05 Zahra Bami , Ali Behnampour , Aniruddha Bora , Hassan Doosti

Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach…

Methodology · Statistics 2025-03-25 George I. Austin , Itsik Pe'er , Tal Korem

We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a pair of variables that has been dropped when learning the causal model. To this end, we use the…

Machine Learning · Statistics 2024-11-11 Daniela Schkoda , Philipp Faller , Patrick Blöbaum , Dominik Janzing

We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search…

Machine Learning · Computer Science 2015-06-26 Mohammad Norouzi , Maxwell D. Collins , David J. Fleet , Pushmeet Kohli

Decision trees are among the most popular machine learning models and are used routinely in applications ranging from revenue management and medicine to bioinformatics. In this paper, we consider the problem of learning optimal binary…

Machine Learning · Computer Science 2023-07-20 Sina Aghaei , Andrés Gómez , Phebe Vayanos

The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as $K$-fold cross validation suffer from large biases. Motivated by the low bias of the leave-one-out cross…

Methodology · Statistics 2020-02-12 Kamiar Rahnama Rad , Arian Maleki

Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but…

Machine Learning · Statistics 2020-08-12 Måns Magnusson , Michael Riis Andersen , Johan Jonasson , Aki Vehtari

Leave-one-out cross-validation (LOO-CV) is a popular method for estimating out-of-sample predictive accuracy. However, computing LOO-CV criteria can be computationally expensive due to the need to fit the model multiple times. In the…

Computation · Statistics 2023-09-28 Luca Silva , Giacomo Zanella

Scoring rules are aimed at evaluation of the quality of predictions, but can also be used for estimation of parameters in statistical models. We propose estimating parameters of multivariate spatial models by maximising the average…

Methodology · Statistics 2024-08-23 Helga Kristin Olafsdottir , Holger Rootzén , David Bolin

Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown…

Machine Learning · Statistics 2026-01-13 Likun Zhang , Wei Ma

The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction…

Machine Learning · Statistics 2018-05-17 Marvin N. Wright , Theresa Dankowski , Andreas Ziegler

Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the…

Computation · Statistics 2017-12-18 Aki Vehtari , Andrew Gelman , Jonah Gabry

Cross-validation can be used to measure a model's predictive accuracy for the purpose of model comparison, averaging, or selection. Standard leave-one-out cross-validation (LOO-CV) requires that the observation model can be factorized into…

Methodology · Statistics 2021-06-21 Paul-Christian Bürkner , Jonah Gabry , Aki Vehtari

Decision trees are a commonly used class of machine learning models valued for their interpretability and versatility, capable of both classification and regression. We propose ZTree, a novel decision tree learning framework that replaces…

Machine Learning · Computer Science 2025-09-17 Eric Cheng , Jie Cheng

We propose a new algorithm called PLUTO for building logistic regression trees to binary response data. PLUTO can capture the nonlinear and interaction patterns in messy data by recursively partitioning the sample space. It fits a simple or…

Machine Learning · Statistics 2014-11-26 Wenwen Zhang , Wei-Yin Loh

Besides serving as prediction models, classification trees are useful for finding important predictor variables and identifying interesting subgroups in the data. These functions can be compromised by weak split selection algorithms that…

Applications · Statistics 2010-11-03 Wei-Yin Loh
‹ Prev 1 2 3 10 Next ›