Related papers: Flexible Variable Selection for Clustering and Cla…

Variable selection for clustering with Gaussian mixture models: state of the art

The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…

Machine Learning · Statistics 2017-02-01 Abdelghafour Talibi , Boujemâa Achchab , Rafik Lasri

Variable Selection Methods for Model-based Clustering

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

clustvarsel: A Package Implementing Variable Selection for Model-based Clustering in R

Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide…

Computation · Statistics 2014-11-04 Luca Scrucca , Adrian E. Raftery

Variable selection for mixed data clustering: a model-based approach

We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…

Computation · Statistics 2017-03-08 Matthieu Marbac , Mohammed Sedki

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task…

Computation · Statistics 2020-12-21 Piotr Sobczyk , Stanislaw Wilczynski , Malgorzata Bogdan , Piotr Graczyk , Julie Josse , Fabien Panloup , Valérie Seegers , Mateusz Staniak

Enhancing the selection of a model-based clustering with external qualitative variables

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

Methodology · Statistics 2013-07-18 Jean-Patrick Baudry , Margarida Cardoso , Gilles Celeux , Maria José Amorim , Ana Sousa Ferreira

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Relevant methods of variable selection have been proposed in model-based clustering and classification. These methods are making use of backward or forward procedures to define the roles of the variables. Unfortunately, these stepwise…

Computation · Statistics 2017-05-03 Gilles Celeux , Cathy Maugis-Rabusseau , Mohammed Sedki

Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results.…

Methodology · Statistics 2012-04-11 Charles Bouveyron , Camille Brunet

Skewed Distributions or Transformations? Modelling Skewness for a Cluster Analysis

Because of its mathematical tractability, the Gaussian mixture model holds a special place in the literature for clustering and classification. For all its benefits, however, the Gaussian mixture model poses problems when the data is skewed…

Applications · Statistics 2020-11-19 Michael P. B. Gallaugher , Paul D. McNicholas , Volodymyr Melnykov , Xuwen Zhu

A Model-Based Clustering Approach for Bounded Data Using Transformation-Based Gaussian Mixture Models

The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data.…

Methodology · Statistics 2025-05-16 Luca Scrucca

Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model

Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to…

Applications · Statistics 2018-06-18 Daniel Andrade , Akiko Takeda , Kenji Fukumizu

Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model

In the mixture modeling frame, this paper presents the polynomial Gaussian cluster-weighted model (CWM). It extends the linear Gaussian CWM, for bivariate data, in a twofold way. Firstly, it allows for possible nonlinear dependencies in the…

Methodology · Statistics 2012-07-05 Antonio Punzo

Clustering and Variable Selection in the Presence of Mixed Variable Types and Missing Data

We consider the problem of model-based clustering in the presence of many correlated, mixed continuous and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach…

Methodology · Statistics 2018-03-13 Curtis Storlie , Scott Myers , S Katusic , Amy Weaver , Robert Voigt , Robert Colligan , Paul Croarkin , Ruth Stoeckel , John Port

Unsupervised Variable Selection for Ultrahigh-Dimensional Clustering Analysis

Compared to supervised variable selection, the research on unsupervised variable selection is far behind. A forward partial-variable clustering full-variable loss (FPCFL) method is proposed for the corresponding challenges. An advantage is…

Methodology · Statistics 2024-12-02 Tonglin Zhang , Huyunting Huang

Variable selection in discriminant analysis for mixed variables and several groups

We propose a method for variable selection in discriminant analysis with mixed categorical and continuous variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating…

Statistics Theory · Mathematics 2017-03-14 Alban Mbina Mbina , Guy Martial Nkiet , Fulgence Eyi Obiang

Parsimonious Ultrametric Manly Mixture Models

A family of parsimonious ultrametric mixture models with the Manly transformation is developed for clustering high-dimensional and asymmetric data. Advances in Gaussian mixture modeling sufficiently handle high-dimensional data but struggle…

Methodology · Statistics 2025-12-16 Alexa A. Sochaniwsky , Paul D. McNicholas

Truecluster: robust scalable clustering with model selection

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection,…

Artificial Intelligence · Computer Science 2007-06-13 Jens Oehlschlägel

Graphical model-based clustering of categorical data

Clustering multivariate data is a pervasive task in many applied problems, particularly in social studies and life science. Model-based approaches to clustering rely on mixture models, where each mixture component corresponds to the kernel…

Methodology · Statistics 2026-01-22 Laura Ferrini , Federico Castelletti