Related papers: Optimal Variable Clustering for High-Dimensional M…

Large covariance matrix estimation with factor-assisted variable clustering

This paper studies the covariance matrix estimation for high-dimensional time series within a new framework that combines low-rank factor and latent variable-specific cluster structures. The popular methods based on assuming the sparse…

Methodology · Statistics 2025-02-25 Dong Li , Xinghao Qiao , Cheng Yu

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

A Fast Algorithm for Clustering High Dimensional Feature Vectors

We propose an algorithm for clustering high dimensional data. If $P$ features for $N$ objects are represented in an $N\times P$ matrix ${\bf X}$, where $N\ll P$, the method is based on exploiting the cluster-dependent structure of the…

Machine Learning · Statistics 2018-11-05 Shahina Rahman , Valen E. Johnson

Multinomial Cluster-Weighted Models for High-Dimensional Data

Modeling of high-dimensional data is very important to categorize different classes. We develop a new mixture model called Multinomial cluster-weighted model (MCWM). We derive the identifiability of a general class of MCWM. We estimate the…

Methodology · Statistics 2022-08-25 Kehinde Olobatuyi , Oludare Ariyo

A Family of Mixture Models for Biclustering

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has…

Methodology · Statistics 2020-09-14 Wangshu Tu , Sanjeena Subedi

Variable Selection for Clustering and Classification

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Weighted total variation based convex clustering

Data clustering is a fundamental problem with a wide range of applications. Standard methods, eg the $k$-means method, usually require solving a non-convex optimization problem. Recently, total variation based convex relaxation to the…

Optimization and Control · Mathematics 2018-08-29 Guodong Xu , Yu Xia , Hui Ji

Optimal Clustering by Lloyd Algorithm for Low-Rank Mixture Model

This paper investigates the computational and statistical limits in clustering matrix-valued observations. We propose a low-rank mixture model (LrMM), adapted from the classical Gaussian mixture model (GMM) to treat matrix-valued…

Statistics Theory · Mathematics 2023-06-08 Zhongyuan Lyu , Dong Xia

Optimal Clustering in Anisotropic Gaussian Mixture Models

We study the clustering task under anisotropic Gaussian Mixture Models where the covariance matrices from different clusters are unknown and are not necessarily the identical matrix. We characterize the dependence of signal-to-noise ratios…

Statistics Theory · Mathematics 2021-01-19 Xin Chen , Anderson Y. Zhang

Clustering and Classification via Cluster-Weighted Factor Analyzers

In model-based clustering and classification, the cluster-weighted model constitutes a convenient approach when the random vector of interest constitutes a response variable Y and a set p of explanatory variables X. However, its…

Methodology · Statistics 2013-07-23 Sanjeena Subedi , Antonio Punzo , Salvatore Ingrassia , Paul D. McNicholas

Biconvex Clustering

Convex clustering has recently garnered increasing interest due to its attractive theoretical and computational properties, but its merits become limited in the face of high-dimensional data. In such settings, pairwise affinity terms that…

Methodology · Statistics 2021-04-02 Saptarshi Chakraborty , Jason Xu

Model Based Clustering of High-Dimensional Binary Data

We propose a mixture of latent trait models with common slope parameters (MCLT) for model-based clustering of high-dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based…

Methodology · Statistics 2017-10-09 Yang Tang , Ryan P. Browne , Paul D. McNicholas

Variable selection for model-based clustering using the integrated complete-data likelihood

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation

We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i…

Machine Learning · Computer Science 2023-12-20 Ilias Diakonikolas , Daniel M. Kane , Jasper C. H. Lee , Thanasis Pittas

Matrix Normal Cluster-Weighted Models

Finite mixtures of regressions with fixed covariates are a commonly used model-based clustering methodology to deal with regression data. However, they assume assignment independence, i.e. the allocation of data points to the clusters is…

Methodology · Statistics 2021-04-27 Salvatore D. Tomarchio , Paul D. McNicholas , Antonio Punzo

Model-based clustering for covariance matrices via penalized Wishart mixture models

Covariance matrices provide a valuable source of information about complex interactions and dependencies within the data. However, from a clustering perspective, this information has often been underutilized and overlooked. Indeed, commonly…

Methodology · Statistics 2024-09-02 Andrea Cappozzo , Alessandro Casa

Optimal Clustering with Missing Values

Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering…

Machine Learning · Statistics 2019-02-27 Shahin Boluki , Siamak Zamani Dadaneh , Xiaoning Qian , Edward R. Dougherty

Variable selection for mixed data clustering: a model-based approach

We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. The first approach consists in…

Computation · Statistics 2017-03-08 Matthieu Marbac , Mohammed Sedki

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task…

Computation · Statistics 2020-12-21 Piotr Sobczyk , Stanislaw Wilczynski , Malgorzata Bogdan , Piotr Graczyk , Julie Josse , Fabien Panloup , Valérie Seegers , Mateusz Staniak

Clustering Longitudinal Ordinal Data via Finite Mixture of Matrix-Variate Distributions

In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming…

Methodology · Statistics 2024-01-29 Francesco Amato , Julien Jacques , Isabelle Prim-Allaz