Related papers: ControlBurn: Feature Selection by Sparse Forests
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model. These costs allow to incorporate disfavored aspects of features, e.g. failure rates of as measuring…
The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection…
Feature selection has been proven a powerful preprocessing step for high-dimensional data analysis. However, most state-of-the-art methods tend to overlook the structural correlation information between pairwise samples, which may…
A vital problem in solving classification or regression problem is to apply feature engineering and variable selection on data before fed into models.One of a most popular feature engineering method is to discretisize continous variable…
While balancing covariates between groups is central for observational causal inference, selecting which features to balance remains a challenging problem. Kernel balancing is a promising approach that first estimates a kernel that captures…
This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there…
Layer pruning offers a promising alternative to standard structured pruning, effectively reducing computational costs, latency, and memory footprint. While notable layer-pruning approaches aim to detect unimportant layers for removal, they…
In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic…
Major complications arise from the recent increase in the amount of high-dimensional data, including high computational costs and memory requirements. Feature selection, which identifies the most relevant and informative attributes of a…
We propose a new optimization-based approach for feature selection in tree ensembles, an important problem in statistics and machine learning. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature…
Interpretable models can have advantages over black-box models, and interpretability is essential for the application of machine learning in critical settings, such as aviation or medicine. This article introduces the LASSO-Clip-EN (LCEN)…
In recent years the importance of finding a meaningful pattern from huge datasets has become more challenging. Data miners try to adopt innovative methods to face this problem by applying feature selection methods. In this paper we propose…
Adapting machine learning algorithms to better handle the presence of clusters or batch effects within training datasets is important across a wide variety of biological applications. This article considers the effect of ensembling Random…
We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$…
Regression forests have long delivered state-of-the-art accuracy, often outperforming regression trees and even neural networks, but they suffer from limited interpretability as ensemble methods. In this work, we revisit forest pruning, an…
Feature selection is an important part of building a machine learning model. By eliminating redundant or misleading features from data, the machine learning model can achieve better performance while reducing the demand on com-puting…
Classification with a sparsity constraint on the solution plays a central role in many high dimensional machine learning applications. In some cases, the features can be grouped together so that entire subsets of features can be selected or…
Feature selection is an important task in many problems occurring in pattern recognition, bioinformatics, machine learning and data mining applications. The feature selection approach enables us to reduce the computation burden and the…
For various applications, the relations between the dependent and independent variables are highly nonlinear. Consequently, for large scale complex problems, neural networks and regression trees are commonly preferred over linear models…
It is becoming increasingly important for machine learning methods to make predictions that are interpretable as well as accurate. In many practical applications, it is of interest which features and feature interactions are relevant to the…