Related papers: Variable selection for mixed data clustering: a mo…

A tractable Multi-Partitions Clustering

In the framework of model-based clustering, a model allowing several latent class variables is proposed. This model assumes that the distribution of the observed data can be factorized into several independent blocks of variables. Each…

Methodology · Statistics 2018-01-23 Matthieu Marbac , Vincent Vandewalle

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

Enhancing the selection of a model-based clustering with external qualitative variables

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

Methodology · Statistics 2013-07-18 Jean-Patrick Baudry , Margarida Cardoso , Gilles Celeux , Maria José Amorim , Ana Sousa Ferreira

Model-based clustering for conditionally correlated categorical data

An extension of the latent class model is presented for clustering categorical data by relaxing the classical "class conditional independence assumption" of variables. This model consists in grouping the variables into inter-independent and…

Computation · Statistics 2015-10-01 Matthieu Marbac , Christophe Biernacki , Vincent Vandewalle

Model Based Clustering for Mixed Data: clustMD

A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type.…

Methodology · Statistics 2015-11-06 Damien McParland , Isobel Claire Gormley

Identifying the number of clusters in discrete mixture models

Research on cluster analysis for categorical data continues to develop, with new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. In this paper, we propose a…

Methodology · Statistics 2014-09-29 Cláudia Silvestre , Margarida G. M. S. Cardoso , Mário A. T. Figueiredo

Model-based clustering and segmentation of time series with changes in regime

Mixture model-based clustering, usually applied to multidimensional data, has become a popular approach in many data analysis problems, both for its good statistical properties and for the simplicity of implementation of the…

Methodology · Statistics 2013-12-30 Allou Samé , Faicel Chamroukhi , Gérard Govaert , Patrice Aknin

A model selection approach for clustering a multinomial sequence with non-negative factorization

We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC…

Machine Learning · Statistics 2015-08-17 Nam H. Lee , Runze Tang , Carey E. Priebe , Michael Rosen

Model-based clustering via linear cluster-weighted models

A novel family of twelve mixture models with random covariates, nested in the linear $t$ cluster-weighted model (CWM), is introduced for model-based clustering. The linear $t$ CWM was recently presented as a robust alternative to the better…

Computation · Statistics 2015-03-10 Salvatore Ingrassia , Simona C. Minotti , Antonio Punzo

Model based clustering of multinomial count data

We consider the problem of inferring an unknown number of clusters in replicated multinomial data. Under a model based clustering point of view, this task can be treated by estimating finite mixtures of multinomial distributions with or…

Methodology · Statistics 2023-07-07 Panagiotis Papastamoulis

Non-parametric Multi-Partitions Clustering

In the framework of model-based clustering, a model, called multi-partitions clustering, allowing several latent class variables has been proposed. This model assumes that the distribution of the observed data can be factorized into several…

Methodology · Statistics 2023-01-09 Marie du Roy de Chaumaray , Vincent Vandewalle

Integrative Model-based clustering of microarray methylation and expression data

In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and…

Applications · Statistics 2012-10-03 Matthias Kormaksson , James G. Booth , Maria E. Figueroa , Ari Melnick

Variable selection for clustering with Gaussian mixture models: state of the art

The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…

Machine Learning · Statistics 2017-02-01 Abdelghafour Talibi , Boujemâa Achchab , Rafik Lasri

A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables

Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be…

Machine Learning · Statistics 2019-05-10 Shu Wang , Jonathan G. Yabes , Chung-Chou H. Chang

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Variable Selection Methods for Model-based Clustering

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

Model Based Clustering of High-Dimensional Binary Data

We propose a mixture of latent trait models with common slope parameters (MCLT) for model-based clustering of high-dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based…

Methodology · Statistics 2017-10-09 Yang Tang , Ryan P. Browne , Paul D. McNicholas

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Model-Based Longitudinal Clustering with Varying Cluster Assignments

It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that…

Methodology · Statistics 2020-05-19 Daniel K. Sewell , Yuguo Chen , William Bernhard , Tracy Sulkin

A Family of Mixture Models for Biclustering

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has…

Methodology · Statistics 2020-09-14 Wangshu Tu , Sanjeena Subedi