Related papers: Variable selection for model-based clustering usin…

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Relevant methods of variable selection have been proposed in model-based clustering and classification. These methods are making use of backward or forward procedures to define the roles of the variables. Unfortunately, these stepwise…

Computation · Statistics 2017-05-03 Gilles Celeux , Cathy Maugis-Rabusseau , Mohammed Sedki

Variable selection for mixed data clustering: a model-based approach

We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…

Computation · Statistics 2017-03-08 Matthieu Marbac , Mohammed Sedki

Variable selection for clustering with Gaussian mixture models: state of the art

The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…

Machine Learning · Statistics 2017-02-01 Abdelghafour Talibi , Boujemâa Achchab , Rafik Lasri

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Flexible Variable Selection for Clustering and Classification

The importance of variable selection for clustering has been recognized for some time, and mixture models are well-established as a statistical approach to clustering. Yet, the literature on variable selection in model-based clustering…

Methodology · Statistics 2024-02-13 Mackenzie R. Neal , Paul D. McNicholas

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Variable selection using MM algorithms

Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function…

Statistics Theory · Mathematics 2007-06-13 David R. Hunter , Runze Li

Variable Selection Methods for Model-based Clustering

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random

Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define…

Methodology · Statistics 2025-11-05 Binh H. Ho , Long Nguyen Chi , TrungTin Nguyen , Binh T. Nguyen , Van Ha Hoang , Christopher Drovandi

Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions

Robust clustering of high-dimensional data is an important topic because clusters in real datasets are often heavy-tailed and/or asymmetric. Traditional approaches to model-based clustering often fail for high dimensional data, e.g., due to…

Methodology · Statistics 2024-06-07 Alexa A. Sochaniwsky , Michael P. B. Gallaugher , Yang Tang , Paul D. McNicholas

Clustering and variable selection for categorical multivariate data

This article investigates unsupervised classification techniques for categorical multivariate data. The study employs multivariate multinomial mixture modeling, which is a type of model particularly applicable to multilocus genotypic data.…

Statistics Theory · Mathematics 2014-03-11 Dominique Bontemps , Wilson Toussile

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task…

Computation · Statistics 2020-12-21 Piotr Sobczyk , Stanislaw Wilczynski , Malgorzata Bogdan , Piotr Graczyk , Julie Josse , Fabien Panloup , Valérie Seegers , Mateusz Staniak

Enhancing the selection of a model-based clustering with external qualitative variables

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

Methodology · Statistics 2013-07-18 Jean-Patrick Baudry , Margarida Cardoso , Gilles Celeux , Maria José Amorim , Ana Sousa Ferreira

Sparse model-based clustering of three-way data via lasso-type penalties

Mixtures of matrix Gaussian distributions provide a probabilistic framework for clustering continuous matrix-variate data, which are becoming increasingly prevalent in various fields. Despite its widespread adoption and successful…

Computation · Statistics 2023-07-21 Andrea Cappozzo , Alessandro Casa , Michael Fop

An ensemble learning method for variable selection: application to high dimensional data and missing values

Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like…

Methodology · Statistics 2021-06-09 Avner Bar-Hen , Vincent Audigier

Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods

Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the…

Methodology · Statistics 2020-03-18 Jiacong Du , Jonathan Boss , Peisong Han , Lauren J Beesley , Stephen A Goutman , Stuart Batterman , Eva L Feldman , Bhramar Mukherjee

A model selection approach for clustering a multinomial sequence with non-negative factorization

We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC…

Machine Learning · Statistics 2015-08-17 Nam H. Lee , Runze Tang , Carey E. Priebe , Michael Rosen

Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results.…

Methodology · Statistics 2012-04-11 Charles Bouveyron , Camille Brunet

Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering

We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a…

Applications · Statistics 2013-07-31 Gilles Celeux , Marie-Laure Martin-Magniette , Cathy Maugis-Rabusseau , Adrian E. Raftery

The Loss Rank Criterion for Variable Selection in Linear Regression Analysis

Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model…

Methodology · Statistics 2014-02-26 Minh-Ngoc Tran