Related papers: Flexible Variable Selection for Clustering and Cla…
The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…
As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…
Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…
Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide…
We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…
VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task…
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…
Relevant methods of variable selection have been proposed in model-based clustering and classification. These methods are making use of backward or forward procedures to define the roles of the variables. Unfortunately, these stepwise…
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results.…
Because of its mathematical tractability, the Gaussian mixture model holds a special place in the literature for clustering and classification. For all its benefits, however, the Gaussian mixture model poses problems when the data is skewed…
The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data.…
Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to…
In the mixture modeling frame, this paper presents the polynomial Gaussian cluster-weighted model (CWM). It extends the linear Gaussian CWM, for bivariate data, in a twofold way. Firstly, it allows for possible nonlinear dependencies in the…
We consider the problem of model-based clustering in the presence of many correlated, mixed continuous and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach…
Compared to supervised variable selection, the research on unsupervised variable selection is far behind. A forward partial-variable clustering full-variable loss (FPCFL) method is proposed for the corresponding challenges. An advantage is…
We propose a method for variable selection in discriminant analysis with mixed categorical and continuous variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating…
A family of parsimonious ultrametric mixture models with the Manly transformation is developed for clustering high-dimensional and asymmetric data. Advances in Gaussian mixture modeling sufficiently handle high-dimensional data but struggle…
Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection,…
Clustering multivariate data is a pervasive task in many applied problems, particularly in social studies and life science. Model-based approaches to clustering rely on mixture models, where each mixture component corresponds to the kernel…