Related papers: Variable Selection for Clustering and Classificati…

Variable Selection Methods for Model-based Clustering

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

Variable selection for clustering with Gaussian mixture models: state of the art

The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…

Machine Learning · Statistics 2017-02-01 Abdelghafour Talibi , Boujemâa Achchab , Rafik Lasri

Flexible Variable Selection for Clustering and Classification

The importance of variable selection for clustering has been recognized for some time, and mixture models are well-established as a statistical approach to clustering. Yet, the literature on variable selection in model-based clustering…

Methodology · Statistics 2024-02-13 Mackenzie R. Neal , Paul D. McNicholas

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results.…

Methodology · Statistics 2012-04-11 Charles Bouveyron , Camille Brunet

Feature and Variable Selection in Classification

The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not…

Machine Learning · Computer Science 2014-02-12 Aaron Karper

Variable selection in discriminant analysis for mixed variables and several groups

We propose a method for variable selection in discriminant analysis with mixed categorical and continuous variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating…

Statistics Theory · Mathematics 2017-03-14 Alban Mbina Mbina , Guy Martial Nkiet , Fulgence Eyi Obiang

Combining clustering of variables and feature selection using random forests

Standard approaches to tackle high-dimensional supervised classification problem often include variable selection and dimension reduction procedures. The novel methodology proposed in this paper combines clustering of variables and feature…

Statistics Theory · Mathematics 2018-11-07 Marie Chavent , Robin Genuer , Jerome Saracco

Enhancing the selection of a model-based clustering with external qualitative variables

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

Methodology · Statistics 2013-07-18 Jean-Patrick Baudry , Margarida Cardoso , Gilles Celeux , Maria José Amorim , Ana Sousa Ferreira

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

DiscoVars: A New Data Analysis Perspective -- Application in Variable Selection for Clustering

We present a new data analysis perspective to determine variable importance regardless of the underlying learning task. Traditionally, variable selection is considered an important step in supervised learning for both classification and…

Machine Learning · Computer Science 2023-04-11 Ayhan Demiriz

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Relevant methods of variable selection have been proposed in model-based clustering and classification. These methods are making use of backward or forward procedures to define the roles of the variables. Unfortunately, these stepwise…

Computation · Statistics 2017-05-03 Gilles Celeux , Cathy Maugis-Rabusseau , Mohammed Sedki

A review of systematic selection of clustering algorithms and their evaluation

Data analysis plays an indispensable role for value creation in industry. Cluster analysis in this context is able to explore given datasets with little or no prior knowledge and to identify unknown patterns. As (big) data complexity…

Machine Learning · Computer Science 2021-06-25 Marc Wegmann , Domenique Zipperling , Jonas Hillenbrand , Jürgen Fleischer

Clustering High-dimensional Data via Feature Selection

High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called…

Methodology · Statistics 2022-10-31 Tianqi Liu , Yu Lu , Biqing Zhu , Hongyu Zhao

Conditional Variable Selection for Intelligent Test

Intelligent test requires efficient and effective analysis of high-dimensional data in a large scale. Traditionally, the analysis is often conducted by human experts, but it is not scalable in the era of big data. To tackle this challenge,…

Machine Learning · Computer Science 2022-07-04 Yiwen Liao , Tianjie Ge , Raphaël Latty , Bin Yang

Clustering Plotted Data by Image Segmentation

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

A Two-Stage Variable Selection Approach for Correlated High Dimensional Predictors

When fitting statistical models, some predictors are often found to be correlated with each other, and functioning together. Many group variable selection methods are developed to select the groups of predictors that are closely related to…

Methodology · Statistics 2021-03-25 Zhiyuan Li

High Dimensional Cluster Analysis Using Path Lengths

A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering…

Data Analysis, Statistics and Probability · Physics 2017-10-16 Kevin McIlhany , Stephen Wiggins

Variable selection for mixed data clustering: a model-based approach

We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…

Computation · Statistics 2017-03-08 Matthieu Marbac , Mohammed Sedki

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei