A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Xuyang Yan; Mrinmoy Sarkar; Biniam Gebru; Shabnam Nazmi; Abdollah Homaifar

A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Machine Learning 2021-11-17 v1

Authors: Xuyang Yan , Mrinmoy Sarkar , Biniam Gebru , Shabnam Nazmi , Abdollah Homaifar

Abstract

Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features, while real-world datasets usually have a mixture of continuous and discrete features. Some recent mixed-type feature selection studies only select features with high relevance to class labels and ignore the redundancy among features. The determination of an appropriate feature subset is also a challenge. In this paper, a supervised feature selection method using density-based feature clustering (SFSDFC) is proposed to obtain an appropriate final feature subset for mixed-type data. SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method. Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters. Extensive experiments as well as comparison studies with five state-of-the-art methods are conducted on SFSDFC using thirteen real-world benchmark datasets and results justify the efficacy of the SFSDFC method.

Keywords

feature selection cluster analysis subspace clustering

Cite

@article{arxiv.2111.08169,
  title  = {A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering},
  author = {Xuyang Yan and Mrinmoy Sarkar and Biniam Gebru and Shabnam Nazmi and Abdollah Homaifar},
  journal= {arXiv preprint arXiv:2111.08169},
  year   = {2021}
}

Comments

6 pages, 3 figures, 4 tables, accepted by the IEEE SMC 2021

A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Abstract

Keywords

Cite

Comments

Related papers