Related papers: Model-based Clustering using Automatic Differentia…
Copulas provide a modular parameterization of multivariate distributions that decouples the modeling of marginals from the dependencies between them. Gaussian Mixture Copula Model (GMCM) is a highly flexible copula that can model many kinds…
In this study, we consider unsupervised clustering of categorical vectors that can be of different size using mixture. We use likelihood maximization to estimate the parameters of the underlying mixture model and a penalization technique to…
Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where…
Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…
Cluster analysis faces two problems in high dimensions: first, the `curse of dimensionality' that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large…
In this article, we discuss two specific classes of models - Gaussian Mixture Copula models and Mixture of Factor Analyzers - and the advantages of doing inference with gradient descent using automatic differentiation. Gaussian mixture…
Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…
Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily…
Initialisation of the EM algorithm in model-based clustering is often crucial. Various starting points in the parameter space often lead to different local maxima of the likelihood function and, so to different clustering partitions. Among…
In this paper, we consider the task of clustering a set of individual time series while modeling each cluster, that is, model-based time series clustering. The task requires a parametric model with sufficient flexibility to describe the…
Robust clustering of high-dimensional data is an important topic because clusters in real datasets are often heavy-tailed and/or asymmetric. Traditional approaches to model-based clustering often fail for high dimensional data, e.g., due to…
Modeling of high-dimensional data is very important to categorize different classes. We develop a new mixture model called Multinomial cluster-weighted model (MCWM). We derive the identifiability of a general class of MCWM. We estimate the…
Popular clustering algorithms based on usual distance functions (e.g., Euclidean distance) often suffer in high dimension, low sample size (HDLSS) situations, where concentration of pairwise distances has adverse effects on their…
The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data.…
Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful…
This work introduces a refinement of the Parsimonious Model for fitting a Gaussian Mixture. The improvement is based on the consideration of clusters of the involved covariance matrices according to a criterion, such as sharing Principal…
A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as…
Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we…
Training the parameters of statistical models to describe a given data set is a central task in the field of data mining and machine learning. A very popular and powerful way of parameter estimation is the method of maximum likelihood…
Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more…