English
Related papers

Related papers: Sequential Permutation Testing of Random Forest Va…

200 papers

Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has…

Methodology · Statistics 2019-12-10 Tim Coleman , Wei Peng , Lucas Mentch

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

Random Forests have become a widely used tool in machine learning since their introduction in 2001, known for their strong performance in classification and regression tasks. One key feature of Random Forests is the Random Forest…

Statistics Theory · Mathematics 2025-12-18 Nico Föge , Lena Schmid , Marc Ditzhaus , Markus Pauly

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between…

Machine Learning · Statistics 2019-12-10 Burim Ramosaj , Markus Pauly

Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model. Despite the widespread use of tree based variable importance measures, pinning down their…

Machine Learning · Statistics 2020-12-14 Jason M. Klusowski , Peter M. Tian

Following the line of classification-based two-sample testing, tests based on the Random Forest classifier are proposed. The developed tests are easy to use, require almost no tuning, and are applicable for any distribution on…

Methodology · Statistics 2021-05-07 Simon Hediger , Loris Michel , Jeffrey Näf

Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference…

Machine Learning · Computer Science 2023-10-27 Ahmad Chamma , Denis A. Engemann , Bertrand Thirion

Invariance-based randomization tests -- such as permutation tests, rotation tests, or sign changes -- are an important and widely used class of statistical methods. They allow drawing inferences under weak assumptions on the data…

Statistics Theory · Mathematics 2022-05-31 Edgar Dobriban

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such…

Methodology · Statistics 2021-03-25 Joshua Daniel Loyal , Ruoqing Zhu , Yifan Cui , Xin Zhang

Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex…

Machine Learning · Computer Science 2025-07-11 Aaron Foote , Danny Krizanc

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…

Machine Learning · Statistics 2024-02-15 Clément Bénard , Jeffrey Näf , Julie Josse

Quantifying the usefulness of individual features in random forests learning can greatly enhance its interpretability. Existing studies have shown that some popularly used feature importance measures for random forests suffer from the bias…

Machine Learning · Statistics 2023-11-14 Chien-Ming Chi , Yingying Fan , Jinchi Lv

We study properties of two resampling scenarios: Conditional Randomisation and Conditional Permutation schemes, which are relevant for testing conditional independence of discrete random variables $X$ and $Y$ given a random variable $Z$.…

Statistics Theory · Mathematics 2023-04-14 Małgorzata Łazęcka , Bartosz Kołodziejek , Jan Mielniczuk

A common problem in machine learning is determining if a variable significantly contributes to a model's prediction performance. This problem is aggravated for datasets, such as gene expression datasets, that suffer the worst case of…

Methodology · Statistics 2023-10-13 Yue Wu , Ted Spaide , Kenji Nakamichi , Russell Van Gelder , Aaron Lee

Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge…

Computation · Statistics 2023-08-29 Yang Shi , Huining Kang , Ji-Hyun Lee , Hui Jiang

Random forests is a state-of-the-art supervised machine learning method which behaves well in high-dimensional settings although some limitations may happen when $p$, the number of predictors, is much larger than the number of observations…

Methodology · Statistics 2019-02-01 Louis Capitaine , Robin Genuer , Rodolphe Thiébaut

We consider the problem of testing whether a single coefficient is equal to zero in linear models when the dimension of covariates $p$ can be up to a constant fraction of sample size $n$. In this regime, an important topic is to propose…

Statistics Theory · Mathematics 2025-05-06 Kaiyue Wen , Tengyao Wang , Yuhao Wang

Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates.…

Methodology · Statistics 2020-10-09 Jesse Hemerik , Magne Thoresen , Livio Finos

We review approaches to statistical inference based on randomization. Permutation tests are treated as an important special case. Under a certain group invariance property, referred to as the ``randomization hypothesis,'' randomization…

Econometrics · Economics 2025-02-05 David M. Ritzwoller , Joseph P. Romano , Azeem M. Shaikh

We present a sequential testing method to identify a practically significant effect. We build on the existing mixture sequential probability ratio test (mSPRT) that can sequentially test for a non-zero treatment effect by using a truncated…

Methodology · Statistics 2025-09-10 Kyu Min Shim
‹ Prev 1 2 3 10 Next ›