Related papers: Clustering inference in multiple groups

U-statistical inference for hierarchical clustering

Clustering methods are a valuable tool for the identification of patterns in high dimensional data with applications in many scientific problems. However, quantifying uncertainty in clustering is a challenging problem, particularly when…

Methodology · Statistics 2018-06-01 Marcio Valk , Gabriela Bettella Cybis

Clustering and Classification of Genetic Data Through U-Statistics

Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical…

Methodology · Statistics 2016-06-13 Gabriela Bettella Cybis , Marcio Valk , Silvia Regina Costa Lopes

Clustering Plotted Data by Image Segmentation

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Cluster analysis is an unsupervised learning strategy that can be employed to identify subgroups of observations in data sets of unknown structure. This strategy is particularly useful for analyzing high-dimensional data such as microarray…

Methodology · Statistics 2016-10-07 Erika S. Helgeson , Eric Bair

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…

Machine Learning · Statistics 2023-10-20 Dimitrios Saligkaras , Vasileios E. Papageorgiou

Unsupervised Deep Discriminant Analysis Based Clustering

This work presents an unsupervised deep discriminant analysis for clustering. The method is based on deep neural networks and aims to minimize the intra-cluster discrepancy and maximize the inter-cluster discrepancy in an unsupervised…

Machine Learning · Computer Science 2022-06-13 Jinyu Cai , Wenzhong Guo , Jicong Fan

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

Truecluster: robust scalable clustering with model selection

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection,…

Artificial Intelligence · Computer Science 2007-06-13 Jens Oehlschlägel

Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters

Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in…

Machine Learning · Computer Science 2024-10-15 Collin Leiber , Niklas Strauß , Matthias Schubert , Thomas Seidl

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…

Machine Learning · Statistics 2017-01-02 Andreas Henelius , Kai Puolamäki , Henrik Boström , Panagiotis Papapetrou

Clustering-based collocation for uncertainty propagation with multivariate dependent inputs

In this article, we propose the use of partitioning and clustering methods as an alternative to Gaussian quadrature for stochastic collocation. The key idea is to use cluster centers as the nodes for collocation. In this way, we can extend…

Numerical Analysis · Mathematics 2019-04-16 A. W. Eggels , D. T. Crommelin , J. A. S. Witteveen

Evaluating network partitions through visualization

Network clustering requires making many decisions manually, such as the number of groups and a statistical model to be used. Even after filtering using an information criterion or regularizing with a nonparametric framework, we are commonly…

Social and Information Networks · Computer Science 2019-06-05 Chihiro Noguchi , Tatsuro Kawamoto

Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate,…

Databases · Computer Science 2012-03-20 Saptarsi Goswami , Amlan Chakrabarti

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

We propose a novel method for multiple clustering that assumes a co-clustering structure (partitions in both rows and columns of the data matrix) in each view. The new method is applicable to high-dimensional data. It is based on a…

Machine Learning · Statistics 2019-07-03 Tomoki Tokuda , Junichiro Yoshimoto , Yu Shimizu , Shigeru Toki , Go Okada , Masahiro Takamura , Tetsuya Yamamoto , Shinpei Yoshimura , Yasumasa Okamoto , Shigeto Yamawaki , Kenji Doya

Statistical Significance of Clustering using Soft Thresholding

Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts.…

Methodology · Statistics 2021-10-20 Hanwen Huang , Yufeng Liu , Ming Yuan , J. S. Marron

Clust-Splitter - an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets

Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the…

Machine Learning · Computer Science 2026-03-19 Jenni Lampainen , Kaisa Joki , Napsu Karmitsa , Marko M. Mäkelä

High Dimensional Cluster Analysis Using Path Lengths

A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering…

Data Analysis, Statistics and Probability · Physics 2017-10-16 Kevin McIlhany , Stephen Wiggins

Particle Clustering Machine: A Dynamical System Based Approach

Identification of the clusters from an unlabeled data set is one of the most important problems in Unsupervised Machine Learning. The state of the art clustering algorithms are based on either the statistical properties or the geometric…

Machine Learning · Computer Science 2018-01-04 Sambarta Dasgupta , Keivan Ebrahimi , Umesh Vaidya

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

Statistical Significance for Hierarchical Clustering

Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other…

Methodology · Statistics 2014-11-20 Patrick K. Kimes , Yufeng Liu , D. Neil Hayes , J. S. Marron