A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering
Abstract
Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features, while real-world datasets usually have a mixture of continuous and discrete features. Some recent mixed-type feature selection studies only select features with high relevance to class labels and ignore the redundancy among features. The determination of an appropriate feature subset is also a challenge. In this paper, a supervised feature selection method using density-based feature clustering (SFSDFC) is proposed to obtain an appropriate final feature subset for mixed-type data. SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method. Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters. Extensive experiments as well as comparison studies with five state-of-the-art methods are conducted on SFSDFC using thirteen real-world benchmark datasets and results justify the efficacy of the SFSDFC method.
Cite
@article{arxiv.2111.08169,
title = {A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering},
author = {Xuyang Yan and Mrinmoy Sarkar and Biniam Gebru and Shabnam Nazmi and Abdollah Homaifar},
journal= {arXiv preprint arXiv:2111.08169},
year = {2021}
}
Comments
6 pages, 3 figures, 4 tables, accepted by the IEEE SMC 2021