Related papers: Sequential Permutation Testing of Random Forest Va…

Scalable and Efficient Hypothesis Testing with Random Forests

Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has…

Methodology · Statistics 2019-12-10 Tim Coleman , Wei Peng , Lucas Mentch

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally…

Machine Learning · Statistics 2009-09-29 Hemant Ishwaran

A Central Limit Theorem for the permutation importance measure

Random Forests have become a widely used tool in machine learning since their introduction in 2001, known for their strong performance in classification and regression tasks. One key feature of Random Forests is the Random Forest…

Statistics Theory · Mathematics 2025-12-18 Nico Föge , Lena Schmid , Marc Ditzhaus , Markus Pauly

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between…

Machine Learning · Statistics 2019-12-10 Burim Ramosaj , Markus Pauly

Nonparametric Variable Screening with Optimal Decision Stumps

Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model. Despite the widespread use of tree based variable importance measures, pinning down their…

Machine Learning · Statistics 2020-12-14 Jason M. Klusowski , Peter M. Tian

On the Use of Random Forest for Two-Sample Testing

Following the line of classification-based two-sample testing, tests based on the Random Forest classifier are proposed. The developed tests are easy to use, require almost no tuning, and are applicable for any distribution on…

Methodology · Statistics 2021-05-07 Simon Hediger , Loris Michel , Jeffrey Näf

Statistically Valid Variable Importance Assessment through Conditional Permutations

Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference…

Machine Learning · Computer Science 2023-10-27 Ahmad Chamma , Denis A. Engemann , Bertrand Thirion

Consistency of invariance-based randomization tests

Invariance-based randomization tests -- such as permutation tests, rotation tests, or sign changes -- are an important and widely used class of statistical methods. They allow drawing inferences under weak assumptions on the data…

Statistics Theory · Mathematics 2022-05-31 Edgar Dobriban

Dimension Reduction Forests: Local Variable Importance using Structured Random Forests

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such…

Methodology · Statistics 2021-03-25 Joshua Daniel Loyal , Ruoqing Zhu , Yifan Cui , Xin Zhang

TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores

Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex…

Machine Learning · Computer Science 2025-07-11 Aaron Foote , Danny Krizanc

MMD-based Variable Importance for Distributional Random Forest

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for…

Machine Learning · Statistics 2024-02-15 Clément Bénard , Jeffrey Näf , Julie Josse

FACT: High-Dimensional Random Forests Inference

Quantifying the usefulness of individual features in random forests learning can greatly enhance its interpretability. Existing studies have shown that some popularly used feature importance measures for random forests suffer from the bias…

Machine Learning · Statistics 2023-11-14 Chien-Ming Chi , Yingying Fan , Jinchi Lv

Analysis of Conditional Randomisation and Permutation schemes with application to conditional independence testing

We study properties of two resampling scenarios: Conditional Randomisation and Conditional Permutation schemes, which are relevant for testing conditional independence of discrete random variables $X$ and $Y$ given a random variable $Z$.…

Statistics Theory · Mathematics 2023-04-14 Małgorzata Łazęcka , Bartosz Kołodziejek , Jan Mielniczuk

Generalized Permutation Framework for Testing Model Variable Significance

A common problem in machine learning is determining if a variable significantly contributes to a model's prediction performance. This problem is aggravated for datasets, such as gene expression datasets, that suffer the worst case of…

Methodology · Statistics 2023-10-13 Yue Wu , Ted Spaide , Kenji Nakamichi , Russell Van Gelder , Aaron Lee

Efficiently estimating small p-values in permutation tests using importance sampling and cross-entropy method

Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge…

Computation · Statistics 2023-08-29 Yang Shi , Huining Kang , Ji-Hyun Lee , Hui Jiang

Random forests for high-dimensional longitudinal data

Random forests is a state-of-the-art supervised machine learning method which behaves well in high-dimensional settings although some limitations may happen when $p$, the number of predictors, is much larger than the number of observations…

Methodology · Statistics 2019-02-01 Louis Capitaine , Robin Genuer , Rodolphe Thiébaut

Residual permutation test for regression coefficient testing

We consider the problem of testing whether a single coefficient is equal to zero in linear models when the dimension of covariates $p$ can be up to a constant fraction of sample size $n$. In this regime, an important topic is to propose…

Statistics Theory · Mathematics 2025-05-06 Kaiyue Wen , Tengyao Wang , Yuhao Wang

Permutation testing in high-dimensional linear models: an empirical investigation

Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates.…

Methodology · Statistics 2020-10-09 Jesse Hemerik , Magne Thoresen , Livio Finos

Randomization Inference: Theory and Applications

We review approaches to statistical inference based on randomization. Permutation tests are treated as an important special case. Under a certain group invariance property, referred to as the ``randomization hypothesis,'' randomization…

Econometrics · Economics 2025-02-05 David M. Ritzwoller , Joseph P. Romano , Azeem M. Shaikh

Sequential Test for Practical Significance: Truncated Mixture Sequential Probability Ratio Test

We present a sequential testing method to identify a practically significant effect. We build on the existing mixture sequential probability ratio test (mSPRT) that can sequentially test for a non-zero treatment effect by using a truncated…

Methodology · Statistics 2025-09-10 Kyu Min Shim