Related papers: Feature Selection via Regularized Trees

Gene selection with guided regularized random forest

The regularized random forest (RRF) was recently proposed for feature selection by building only one ensemble. In RRF the features are evaluated on a part of the training data at each tree node. We derive an upper bound for the number of…

Machine Learning · Computer Science 2013-06-21 Houtao Deng , George Runger

Generalizing Gain Penalization for Feature Selection in Tree-based Models

We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially…

Machine Learning · Statistics 2020-06-16 Bruna Wundervald , Andrew Parnell , Katarina Domijan

Heterogeneous Random Forest

Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we…

Machine Learning · Computer Science 2024-10-28 Ye-eun Kim , Seoung Yun Kim , Hyunjoong Kim

Regional Tree Regularization for Interpretability in Black Box Models

The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant…

Machine Learning · Computer Science 2020-03-17 Mike Wu , Sonali Parbhoo , Michael Hughes , Ryan Kindle , Leo Celi , Maurizio Zazzi , Volker Roth , Finale Doshi-Velez

Diversity Conscious Refined Random Forest

Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and…

Machine Learning · Computer Science 2025-07-08 Sijan Bhattarai , Saurav Bhandari , Girija Bhusal , Saroj Shakya , Tapendra Pandey

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well…

Machine Learning · Statistics 2020-09-15 Lucas Mentch , Siyu Zhou

Random Subspace with Trees for Feature Selection Under Memory Constraints

Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this…

Machine Learning · Statistics 2017-09-07 Antonio Sutera , Célia Châtel , Gilles Louppe , Louis Wehenkel , Pierre Geurts

Interpreting Models via Single Tree Approximation

We propose a procedure to build a decision tree which approximates the performance of complex machine learning models. This single approximation tree can be used to interpret and simplify the predicting pattern of random forests (RFs) and…

Methodology · Statistics 2016-10-31 Yichen Zhou , Giles Hooker

Cluster-Based Random Forest Visualization and Interpretation

Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree,…

Machine Learning · Computer Science 2025-07-31 Max Sondag , Christofer Meinecke , Dennis Collaris , Tatiana von Landesberger , Stef van den Elzen

Improved Weighted Random Forest for Classification Problems

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of…

Machine Learning · Computer Science 2021-03-01 Mohsen Shahhosseini , Guiping Hu

Fair Forests: Regularized Tree Induction to Minimize Model Bias

The potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly. Surprisingly, there is no prior work developing tree-induction…

Machine Learning · Statistics 2017-12-25 Edward Raff , Jared Sylvester , Steven Mills

Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

Due to their long-standing reputation as excellent off-the-shelf predictors, random forests continue remain a go-to model of choice for applied statisticians and data scientists. Despite their widespread use, however, until recently, little…

Machine Learning · Statistics 2021-04-01 Siyu Zhou , Lucas Mentch

Regularized regression on compositional trees with application to MRI analysis

A compositional tree refers to a tree structure on a set of random variables where each random variable is a node and composition occurs at each non-leaf node of the tree. As a generalization of compositional data, compositional trees…

Methodology · Statistics 2021-04-20 Bingkai Wang , Brian S. Caffo , Xi Luo , Chin-Fu Liu , Andreia V. Faria , Michael I. Miller , Yi Zhao

Top-$k$ Regularization for Supervised Feature Selection

Feature selection identifies subsets of informative features and reduces dimensions in the original feature space, helping provide insights into data generation or a variety of domain problems. Existing methods mainly depend on feature…

Machine Learning · Computer Science 2021-06-07 Xinxing Wu , Qiang Cheng

Bayesian Model Selection on Random Networks

A general Bayesian framework for model selection on random network models regarding their features is considered. The goal is to develop a principle Bayesian model selection approach to compare different fittable, not necessarily nested,…

Methodology · Statistics 2020-04-30 Papamichalis Marios

Targeting predictors in random forest regression

Random forest regression (RF) is an extremely popular tool for the analysis of high-dimensional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting)…

Econometrics · Economics 2020-11-09 Daniel Borup , Bent Jesper Christensen , Nicolaj Nørgaard Mühlbach , Mikkel Slot Nielsen

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…

Machine Learning · Statistics 2019-05-28 Hanyuan Hang , Xiaoyu Liu , Ingo Steinwart

Binary Stochastic Filtering: feature selection and beyond

Feature selection is one of the most decisive tools in understanding data and machine learning models. Among other methods, sparsity induced by $L^{1}$ penalty is one of the simplest and best studied approaches to this problem. Although…

Machine Learning · Computer Science 2020-07-09 Andrii Trelin , Aleš Procházka

Guided Random Forest in the RRF Package

Random Forest (RF) is a powerful supervised learner and has been popularly used in many applications such as bioinformatics. In this work we propose the guided random forest (GRF) for feature selection. Similar to a feature selection method…

Machine Learning · Computer Science 2013-11-19 Houtao Deng

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature…

Machine Learning · Computer Science 2022-01-19 Xiaojun Mao , Liuhua Peng , Zhonglei Wang