Related papers: Pooled variable scaling for cluster analysis

Variable Selection Methods for Model-based Clustering

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

Shape complexity in cluster analysis

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the…

Machine Learning · Computer Science 2023-05-30 Eduardo J. Aguilar , Valmir C. Barbosa

Adaptive Bayesian Variable Clustering via Structural Learning of Breast Cancer Data

Clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce…

Computation · Statistics 2022-02-09 Riddhi Pratim Ghosh , Arnab Kumar Maity , Mohsen Pourahmadi , Bani K. Mallick

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

Enhancing the selection of a model-based clustering with external qualitative variables

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

Methodology · Statistics 2013-07-18 Jean-Patrick Baudry , Margarida Cardoso , Gilles Celeux , Maria José Amorim , Ana Sousa Ferreira

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task…

Computation · Statistics 2020-12-21 Piotr Sobczyk , Stanislaw Wilczynski , Malgorzata Bogdan , Piotr Graczyk , Julie Josse , Fabien Panloup , Valérie Seegers , Mateusz Staniak

Panel Data with Unknown Clusters

Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to…

Econometrics · Economics 2022-01-14 Yong Cai

Scaling pattern mining through non-overlapping variable partitioning

Biclustering algorithms play a central role in the biotechnological and biomedical domains. The knowledge extracted supports the extraction of putative regulatory modules, essential to understanding diseases, aiding therapy research, and…

Databases · Computer Science 2022-12-13 Leonardo Alexandre , Rafael S. Costa , Rui Henriques

Unsupervised Variable Selection for Ultrahigh-Dimensional Clustering Analysis

Compared to supervised variable selection, the research on unsupervised variable selection is far behind. A forward partial-variable clustering full-variable loss (FPCFL) method is proposed for the corresponding challenges. An advantage is…

Methodology · Statistics 2024-12-02 Tonglin Zhang , Huyunting Huang

Scalable Bayesian Variable Selection for Structured High-dimensional Data

Variable selection for structured covariates lying on an underlying known graph is a problem motivated by practical applications, and has been a topic of increasing interest. However, most of the existing methods may not be scalable to high…

Methodology · Statistics 2016-04-27 Changgee Chang , Suprateek Kundu , Qi Long

A Hybrid Mixture Approach for Clustering and Characterizing Cancer Data

Model-based clustering is widely used for identifying and distinguishing types of diseases. However, modern biomedical data coming with high dimensions make it challenging to perform the model estimation in traditional cluster analysis. The…

Methodology · Statistics 2025-07-22 Kazeem Kareem , Fan Dai

Variable selection for clustering with Gaussian mixture models: state of the art

The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…

Machine Learning · Statistics 2017-02-01 Abdelghafour Talibi , Boujemâa Achchab , Rafik Lasri

Scalable Varied-Density Clustering via Graph Propagation

We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based…

Machine Learning · Computer Science 2025-08-06 Ninh Pham , Yingtao Zheng , Hugo Phibbs

Statistical method for pooling categorical biomarkers from multi-center matched/nested case-control studies

Pooled analyses that aggregate data from multiple studies are becoming increasingly common in collaborative epidemiologic research in order to increase the size and diversity of the study population. However, biomarker measurements from…

Methodology · Statistics 2025-05-06 Yujie Wu , Xiao Wu , Mitchell H. Gail , Regina G. Ziegler , Stephanie A. Smith-Warner , Molin Wang

Clustering Tails in High Dimension

One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for…

Methodology · Statistics 2025-06-25 Liujun Chen , Marco Oesting , Chen Zhou

Identification of relevant subtypes via preweighted sparse clustering

Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to…

Methodology · Statistics 2016-09-23 Sheila Gaynor , Eric Bair

Optimal Variable Clustering for High-Dimensional Matrix Valued Data

Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which…

Machine Learning · Statistics 2023-12-07 Inbeom Lee , Siyi Deng , Yang Ning

Investigate the Correlation of Breast Cancer Dataset using Different Clustering Technique

The objectives of this paper are to explore ways to analyze breast cancer dataset in the context of unsupervised learning without prior training model. The paper investigates different ways of clustering techniques as well as preprocessing.…

Machine Learning · Computer Science 2021-09-06 Somenath Chakraborty , Beddhu Murali