English
Related papers

Related papers: Pooled variable scaling for cluster analysis

200 papers

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to…

Methodology · Statistics 2018-09-25 Michael Fop , Thomas Brendan Murphy

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the…

Machine Learning · Computer Science 2023-05-30 Eduardo J. Aguilar , Valmir C. Barbosa

Clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce…

Computation · Statistics 2022-02-09 Riddhi Pratim Ghosh , Arnab Kumar Maity , Mohsen Pourahmadi , Bani K. Mallick

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…

Computation · Statistics 2013-03-22 Jeffrey L. Andrews , Paul D. McNicholas

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty.…

Methodology · Statistics 2016-12-23 Marbac Matthieu , Sedki Mohammed

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…

VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task…

Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to…

Econometrics · Economics 2022-01-14 Yong Cai

Biclustering algorithms play a central role in the biotechnological and biomedical domains. The knowledge extracted supports the extraction of putative regulatory modules, essential to understanding diseases, aiding therapy research, and…

Databases · Computer Science 2022-12-13 Leonardo Alexandre , Rafael S. Costa , Rui Henriques

Compared to supervised variable selection, the research on unsupervised variable selection is far behind. A forward partial-variable clustering full-variable loss (FPCFL) method is proposed for the corresponding challenges. An advantage is…

Methodology · Statistics 2024-12-02 Tonglin Zhang , Huyunting Huang

Variable selection for structured covariates lying on an underlying known graph is a problem motivated by practical applications, and has been a topic of increasing interest. However, most of the existing methods may not be scalable to high…

Methodology · Statistics 2016-04-27 Changgee Chang , Suprateek Kundu , Qi Long

Model-based clustering is widely used for identifying and distinguishing types of diseases. However, modern biomedical data coming with high dimensions make it challenging to perform the model estimation in traditional cluster analysis. The…

Methodology · Statistics 2025-07-22 Kazeem Kareem , Fan Dai

The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the…

Machine Learning · Statistics 2017-02-01 Abdelghafour Talibi , Boujemâa Achchab , Rafik Lasri

We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based…

Machine Learning · Computer Science 2025-08-06 Ninh Pham , Yingtao Zheng , Hugo Phibbs

Pooled analyses that aggregate data from multiple studies are becoming increasingly common in collaborative epidemiologic research in order to increase the size and diversity of the study population. However, biomarker measurements from…

One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for…

Methodology · Statistics 2025-06-25 Liujun Chen , Marco Oesting , Chen Zhou

Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to…

Methodology · Statistics 2016-09-23 Sheila Gaynor , Eric Bair

Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which…

Machine Learning · Statistics 2023-12-07 Inbeom Lee , Siyi Deng , Yang Ning

The objectives of this paper are to explore ways to analyze breast cancer dataset in the context of unsupervised learning without prior training model. The paper investigates different ways of clustering techniques as well as preprocessing.…

Machine Learning · Computer Science 2021-09-06 Somenath Chakraborty , Beddhu Murali
‹ Prev 1 2 3 10 Next ›