Related papers: VARCLUST: clustering variables using dimensionalit…

Truecluster: robust scalable clustering with model selection

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection,…

Artificial Intelligence · Computer Science 2007-06-13 Jens Oehlschlägel

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

ClustOfVar: An R Package for the Clustering of Variables

Clustering of variables is as a way to arrange variables into homogeneous clusters, i.e., groups of variables which are strongly related to each other and thus bring the same information. These approaches can then be useful for dimension…

Computation · Statistics 2012-12-12 M. Chavent , V. Kuentz , B. Liquet , L. Saracco

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

visClust: A visual clustering algorithm based on orthogonal projections

We present a novel clustering algorithm, visClust, that is based on lower dimensional data representations and visual interpretation. Thereto, we design a transformation that allows the data to be represented by a binary integer array…

Computer Vision and Pattern Recognition · Computer Science 2024-05-31 Anna Breger , Clemens Karner , Martin Ehler

Flexible Variable Selection for Clustering and Classification

The importance of variable selection for clustering has been recognized for some time, and mixture models are well-established as a statistical approach to clustering. Yet, the literature on variable selection in model-based clustering…

Methodology · Statistics 2024-02-13 Mackenzie R. Neal , Paul D. McNicholas

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been…

Machine Learning · Statistics 2022-02-03 Rebecca Marion , Johannes Lederer , Bernadette Govaerts , Rainer von Sachs

Variable Selection Methods for Model-based Clustering

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

Combining clustering of variables and feature selection using random forests

Standard approaches to tackle high-dimensional supervised classification problem often include variable selection and dimension reduction procedures. The novel methodology proposed in this paper combines clustering of variables and feature…

Statistics Theory · Mathematics 2018-11-07 Marie Chavent , Robin Genuer , Jerome Saracco

Clustering in Partially Labeled Stochastic Block Models via Total Variation Minimization

A main task in data analysis is to organize data points into coherent groups or clusters. The stochastic block model is a probabilistic model for the cluster structure. This model prescribes different probabilities for the presence of edges…

Machine Learning · Computer Science 2020-09-24 Alexander Jung

Optimal Bayesian estimators for latent variable cluster models

In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior…

Methodology · Statistics 2017-03-23 Riccardo Rastelli , Nial Friel

A pairwise likelihood approach to simultaneous clustering and dimensional reduction of ordinal data

The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the problem is made more difficult by the presence of noise variables/dimensions that…

Methodology · Statistics 2015-04-14 Monia Ranalli , Roberto Rocci

Robust subspace clustering

Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In…

Machine Learning · Computer Science 2014-05-26 Mahdi Soltanolkotabi , Ehsan Elhamifar , Emmanuel J. Candès

Sparse Bayesian Hierarchical Modeling of High-dimensional Clustering Problems

Clustering is one of the most widely used procedures in the analysis of microarray data, for example with the goal of discovering cancer subtypes based on observed heterogeneity of genetic marks between different tissues. It is well-known…

Methodology · Statistics 2009-04-21 Heng Lian

Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results.…

Methodology · Statistics 2012-04-11 Charles Bouveyron , Camille Brunet

clustvarsel: A Package Implementing Variable Selection for Model-based Clustering in R

Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide…

Computation · Statistics 2014-11-04 Luca Scrucca , Adrian E. Raftery

Scaling pattern mining through non-overlapping variable partitioning

Biclustering algorithms play a central role in the biotechnological and biomedical domains. The knowledge extracted supports the extraction of putative regulatory modules, essential to understanding diseases, aiding therapy research, and…

Databases · Computer Science 2022-12-13 Leonardo Alexandre , Rafael S. Costa , Rui Henriques

Robust Bayesian Cluster Enumeration Based on the $t$ Distribution

A major challenge in cluster analysis is that the number of data clusters is mostly unknown and it must be estimated prior to clustering the observed data. In real-world applications, the observed data is often subject to heavy tailed noise…

Machine Learning · Statistics 2020-05-06 Freweyni K. Teklehaymanot , Michael Muma , Abdelhak M. Zoubir

A Theoretical Analysis of Noisy Sparse Subspace Clustering on Dimensionality-Reduced Data

Subspace clustering is the problem of partitioning unlabeled data points into a number of clusters so that data points within one cluster lie approximately on a low-dimensional linear subspace. In many practical scenarios, the…

Machine Learning · Statistics 2019-01-24 Yining Wang , Yu-Xiang Wang , Aarti Singh