Related papers: Model-independent variable selection via the rule-…

One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing

Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based…

Machine Learning · Statistics 2025-12-24 Albert Dorador

Variable selection for general index models via sliced inverse regression

Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential…

Methodology · Statistics 2014-09-24 Bo Jiang , Jun S. Liu

A Simple and Effective Model-Based Variable Importance Measure

In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if…

Machine Learning · Statistics 2018-05-15 Brandon M. Greenwell , Bradley C. Boehmke , Andrew J. McCarthy

A Computational Exploration of Emerging Methods of Variable Importance Estimation

Estimating the importance of variables is an essential task in modern machine learning. This help to evaluate the goodness of a feature in a given model. Several techniques for estimating the importance of variables have been developed…

Machine Learning · Statistics 2022-08-09 Louis Mozart Kamdem , Ernest Fokoue

Variable selection in multiple regression with random design

We propose a method for variable selection in multiple regression with random predictors. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating suitable permutation and…

Statistics Theory · Mathematics 2015-06-29 Alban Mbina Mbina , Guy Martial Nkiet , Assi Nguessan

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

Efficient Test-based Variable Selection for High-dimensional Linear Models

Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression…

Methodology · Statistics 2018-02-01 Siliang Gong , Kai Zhang , Yufeng Liu

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between…

Machine Learning · Statistics 2019-12-10 Burim Ramosaj , Markus Pauly

Generalized Permutation Framework for Testing Model Variable Significance

A common problem in machine learning is determining if a variable significantly contributes to a model's prediction performance. This problem is aggravated for datasets, such as gene expression datasets, that suffer the worst case of…

Methodology · Statistics 2023-10-13 Yue Wu , Ted Spaide , Kenji Nakamichi , Russell Van Gelder , Aaron Lee

A Transparent and Nonlinear Method for Variable Selection

Variable selection is a procedure to attain the truly important predictors from inputs. Complex nonlinear dependencies and strong coupling pose great challenges for variable selection in high-dimensional data. In addition, real-world…

Methodology · Statistics 2023-07-04 Keyao Wang , Huiwen Wang , Jichang Zhao , Lihong Wang

An Ensemble Approach toward Automated Variable Selection for Network Anomaly Detection

While variable selection is essential to optimize the learning complexity by prioritizing features, automating the selection process is preferred since it requires laborious efforts with intensive analysis otherwise. However, it is not an…

Machine Learning · Computer Science 2019-10-29 Makiya Nakashima , Alex Sim , Youngsoo Kim , Jonghyun Kim , Jinoh Kim

Determination of class-specific variables in nonparametric multiple-class classification

As technology advanced, collecting data via automatic collection devices become popular, thus we commonly face data sets with lengthy variables, especially when these data sets are collected without specific research goals beforehand. It…

Machine Learning · Statistics 2022-05-10 Wan-Ping Nicole Chen , Yuan-chin Ivan Chang

Variable Selection Using Bayesian Additive Regression Trees

Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive…

Methodology · Statistics 2021-12-30 Chuji Luo , Michael J. Daniels

Variable selection in discriminant analysis for mixed variables and several groups

We propose a method for variable selection in discriminant analysis with mixed categorical and continuous variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating…

Statistics Theory · Mathematics 2017-03-14 Alban Mbina Mbina , Guy Martial Nkiet , Fulgence Eyi Obiang

Variable selection in semiparametric regression modeling

In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and…

Statistics Theory · Mathematics 2008-12-18 Runze Li , Hua Liang

Challenges in Variable Importance Ranking Under Correlation

Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation…

Machine Learning · Statistics 2024-02-07 Annie Liang , Thomas Jemielita , Andy Liaw , Vladimir Svetnik , Lingkang Huang , Richard Baumgartner , Jason M. Klusowski

Variable selection via thresholding

Variable selection comprises an important step in many modern statistical inference procedures. In the regression setting, when estimators cannot shrink irrelevant signals to zero, covariates without relationships to the response often…

Statistics Theory · Mathematics 2025-03-28 Ka Long Keith Ho , Hien Duy Nguyen

Time-Varying Propensity Score to Bridge the Gap between the Past and Present

Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods…

Machine Learning · Computer Science 2024-05-03 Rasool Fakoor , Jonas Mueller , Zachary C. Lipton , Pratik Chaudhari , Alexander J. Smola

Variable selection with missing data in both covariates and outcomes: Imputation and machine learning

The missing data issue is ubiquitous in health studies. Variable selection in the presence of both missing covariates and outcomes is an important statistical research topic but has been less studied. Existing literature focuses on…

Methodology · Statistics 2021-07-09 Liangyuan Hu , Jung-Yi Joyce Lin , Jiayi Ji

Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution

Variable selection for Gaussian process models is often done using automatic relevance determination, which uses the inverse length-scale parameter of each input variable as a proxy for variable relevance. This implicitly determined…

Methodology · Statistics 2019-04-24 Topi Paananen , Juho Piironen , Michael Riis Andersen , Aki Vehtari