English
Related papers

Related papers: Variable selection for mixed data clustering: a mo…

200 papers

In the framework of model-based clustering, a model allowing several latent class variables is proposed. This model assumes that the distribution of the observed data can be factorized into several independent blocks of variables. Each…

Methodology · Statistics 2018-01-23 Matthieu Marbac , Vincent Vandewalle

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

An extension of the latent class model is presented for clustering categorical data by relaxing the classical "class conditional independence assumption" of variables. This model consists in grouping the variables into inter-independent and…

Computation · Statistics 2015-10-01 Matthieu Marbac , Christophe Biernacki , Vincent Vandewalle

A model based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type.…

Methodology · Statistics 2015-11-06 Damien McParland , Isobel Claire Gormley

Research on cluster analysis for categorical data continues to develop, with new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. In this paper, we propose a…

Methodology · Statistics 2014-09-29 Cláudia Silvestre , Margarida G. M. S. Cardoso , Mário A. T. Figueiredo

Mixture model-based clustering, usually applied to multidimensional data, has become a popular approach in many data analysis problems, both for its good statistical properties and for the simplicity of implementation of the…

Methodology · Statistics 2013-12-30 Allou Samé , Faicel Chamroukhi , Gérard Govaert , Patrice Aknin

We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC…

Machine Learning · Statistics 2015-08-17 Nam H. Lee , Runze Tang , Carey E. Priebe , Michael Rosen

A novel family of twelve mixture models with random covariates, nested in the linear $t$ cluster-weighted model (CWM), is introduced for model-based clustering. The linear $t$ CWM was recently presented as a robust alternative to the better…

Computation · Statistics 2015-03-10 Salvatore Ingrassia , Simona C. Minotti , Antonio Punzo

We consider the problem of inferring an unknown number of clusters in replicated multinomial data. Under a model based clustering point of view, this task can be treated by estimating finite mixtures of multinomial distributions with or…

Methodology · Statistics 2023-07-07 Panagiotis Papastamoulis

In the framework of model-based clustering, a model, called multi-partitions clustering, allowing several latent class variables has been proposed. This model assumes that the distribution of the observed data can be factorized into several…

Methodology · Statistics 2023-01-09 Marie du Roy de Chaumaray , Vincent Vandewalle

In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and…

Applications · Statistics 2012-10-03 Matthias Kormaksson , James G. Booth , Maria E. Figueroa , Ari Melnick

The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…

Machine Learning · Statistics 2017-02-01 Abdelghafour Talibi , Boujemâa Achchab , Rafik Lasri

Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be…

Machine Learning · Statistics 2019-05-10 Shu Wang , Jonathan G. Yabes , Chung-Chou H. Chang

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

We propose a mixture of latent trait models with common slope parameters (MCLT) for model-based clustering of high-dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based…

Methodology · Statistics 2017-10-09 Yang Tang , Ryan P. Browne , Paul D. McNicholas

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that…

Methodology · Statistics 2020-05-19 Daniel K. Sewell , Yuguo Chen , William Bernhard , Tracy Sulkin

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has…

Methodology · Statistics 2020-09-14 Wangshu Tu , Sanjeena Subedi
‹ Prev 1 2 3 10 Next ›