Related papers: Cross-Study Replicability in Cluster Analysis

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

Replicability Across Multiple Studies

Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely…

Methodology · Statistics 2023-05-09 Marina Bogomolov , Ruth Heller

Clustering of Transcriptomic Data for the Identification of Cancer Subtypes

Cancer is a number of related yet highly heterogeneous diseases. Correct identification of cancer subtypes is critical for clinical decisions. The advance in sequencing technologies has made it possible to study cancer based on abundant…

Applications · Statistics 2018-11-27 Xiaochun Chen , Honggang Wang , Donghui Yan

Interpretable Clustering with the Distinguishability Criterion

Cluster analysis is a popular unsupervised learning tool used in many disciplines to identify heterogeneous sub-populations within a sample. However, validating cluster analysis results and determining the number of clusters in a data set…

Machine Learning · Statistics 2024-04-26 Ali Turfah , Xiaoquan Wen

Spatial clustering of array CGH features in combination with hierarchical multiple testing

We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering:…

Applications · Statistics 2010-12-21 Kyung In Kim , Etienne Roquain , Mark Van De Wiel

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Cluster validation by measurement of clustering characteristics relevant to the user

There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria…

Methodology · Statistics 2020-09-10 Christian Hennig

Bayesian Consensus Clustering

The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These…

Machine Learning · Statistics 2015-12-01 Eric F. Lock , David B. Dunson

Assessing replicability of findings across two studies of multiple features

Replicability analysis aims to identify the findings that replicated across independent studies that examine the same features. We provide powerful novel replicability analysis procedures for two studies for FWER and for FDR control on the…

Methodology · Statistics 2019-03-01 Marina Bogomolov , Ruth Heller

Identification of relevant subtypes via preweighted sparse clustering

Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to…

Methodology · Statistics 2016-09-23 Sheila Gaynor , Eric Bair

An Effective and Efficient Approach for Clusterability Evaluation

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet,…

Machine Learning · Computer Science 2016-02-24 Margareta Ackerman , Andreas Adolfsson , Naomi Brownstein

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult task in cluster analysis. Many methods have been proposed to estimate K, including some top performers using resampling approach. When performing cluster analysis in…

Methodology · Statistics 2019-09-05 Yujia Li , Xiangrui Zeng , Chien-Wei Lin , George Tseng

Parallel and Scalable Precise Clustering for Homologous Protein Discovery

This paper presents a new, parallel implementation of clustering and demonstrates its utility in greatly speeding up the process of identifying homologous proteins. Clustering is a technique to reduce the number of comparison needed to find…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-29 Stuart Byma , Akash Dhasade , Adrian Altenhoff , Christophe Dessimoz , James R. Larus

Extracting replicable associations across multiple studies: algorithms for controlling the false discovery rate

Extracting associations that recur across multiple studies while controlling the false discovery rate is a fundamental challenge. Here, we consider an extension of Efron's single-study two-groups model to allow joint analysis of multiple…

Methodology · Statistics 2019-01-14 David Amar , Ron Shamir , Daniel Yekutieli

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice…

Methodology · Statistics 2020-06-24 Serhat Emre Akhanli , Christian Hennig

Reclustering: A New Method to Test the Appropriate Level of Clustering

When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For…

Methodology · Statistics 2025-11-12 Kentaro Fukumoto

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

Clustering Optimisation Method for Highly Connected Biological Data

Currently, data-driven discovery in biological sciences resides in finding segmentation strategies in multivariate data that produce sensible descriptions of the data. Clustering is but one of several approaches and sometimes falls short…

Quantitative Methods · Quantitative Biology 2022-08-12 Richard Tjörnhammar

An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes

Due to the complexity of cancer, clustering algorithms have been used to disentangle the observed heterogeneity and identify cancer subtypes that can be treated specifically. While kernel based clustering approaches allow the use of more…

Machine Learning · Statistics 2018-11-21 Nora K. Speicher , Nico Pfeifer

Interpretable Clustering: A Survey

In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in…

Machine Learning · Computer Science 2026-01-21 Lianyu Hu , Mudi Jiang , Junjie Dong , Xinying Liu , Zengyou He