Related papers: A score function for Bayesian cluster analysis

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For…

Machine Learning · Computer Science 2012-06-15 Brian Kulis , Michael I. Jordan

Selection of the Number of Clusters in Functional Data Analysis

Identifying the number $K$ of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of $K$ that correctly characterizes the features of the data is essential for building meaningful clusters. In this…

Methodology · Statistics 2019-05-06 Adriano Zanin Zambom , Julian A. Collazos , Ronaldo Dias

Estimating the number of clusters using cross-validation

Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…

Methodology · Statistics 2017-02-10 Wei Fu , Patrick O. Perry

Comparison three methods of clustering: k-means, spectral clustering and hierarchical clustering

Comparison of three kind of the clustering and find cost function and loss function and calculate them. Error rate of the clustering methods and how to calculate the error percentage always be one on the important factor for evaluating the…

Machine Learning · Computer Science 2014-11-14 Kamran Kowsari

Bayesian Level Set Clustering

Classically, Bayesian clustering interprets each component of a mixture model as a cluster. The inferred clustering posterior is highly sensitive to any inaccuracies in the kernel within each component. As this kernel is made more flexible,…

Methodology · Statistics 2025-12-12 David Buch , Miheer Dewaskar , David B. Dunson

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult task in cluster analysis. Many methods have been proposed to estimate K, including some top performers using resampling approach. When performing cluster analysis in…

Methodology · Statistics 2019-09-05 Yujia Li , Xiangrui Zeng , Chien-Wei Lin , George Tseng

Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final…

Machine Learning · Statistics 2023-08-14 Luca Coraggio , Pietro Coretto

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar

Optimal Bayesian estimators for latent variable cluster models

In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior…

Methodology · Statistics 2017-03-23 Riccardo Rastelli , Nial Friel

Mean-field theory of Bayesian clustering

We show that model-based Bayesian clustering, the probabilistically most systematic approach to the partitioning of data, can be mapped into a statistical physics problem for a gas of particles, and as a result becomes amenable to a…

Disordered Systems and Neural Networks · Physics 2018-10-24 Alexander Mozeika , Anthony CC Coolen

Interpretable Clustering with the Distinguishability Criterion

Cluster analysis is a popular unsupervised learning tool used in many disciplines to identify heterogeneous sub-populations within a sample. However, validating cluster analysis results and determining the number of clusters in a data set…

Machine Learning · Statistics 2024-04-26 Ali Turfah , Xiaoquan Wen

A Clustering Method Based on Information Entropy Payload

Existing clustering algorithms such as K-means often need to preset parameters such as the number of categories K, and such parameters may lead to the failure to output objective and consistent clustering results. This paper introduces a…

Machine Learning · Computer Science 2022-09-15 Shaodong Deng , Long Sheng , Jiayi Nie , Fuyi Deng

Subspace clustering without knowing the number of clusters: A parameter free approach

Subspace clustering, the task of clustering high dimensional data when the data points come from a union of subspaces is one of the fundamental tasks in unsupervised machine learning. Most of the existing algorithms for this task require…

Machine Learning · Statistics 2020-10-28 Vishnu Menon , Gokularam M , Sheetal Kalyani

Fairness in Clustering with Multiple Sensitive Attributes

A clustering may be considered as fair on pre-specified sensitive attributes if the proportions of sensitive attribute groups in each cluster reflect that in the dataset. In this paper, we consider the task of fair clustering for scenarios…

Machine Learning · Computer Science 2020-01-27 Savitha Sam Abraham , Deepak P , Sowmya S Sundaram

A Bayesian Approach to Clustering via the Proper Bayesian Bootstrap: the Bayesian Bagged Clustering (BBC) algorithm

The paper presents a novel approach for unsupervised techniques in the field of clustering. A new method is proposed to enhance existing literature models using the proper Bayesian bootstrap to improve results in terms of robustness and…

Machine Learning · Statistics 2024-09-16 Federico Maria Quetti , Silvia Figini , Elena ballante

Multilinear objective function-based clustering

The input of most clustering algorithms is a symmetric matrix quantifying similarity within data pairs. Such a matrix is here turned into a quadratic set function measuring cluster score or similarity within data subsets larger than pairs.…

Discrete Mathematics · Computer Science 2015-09-30 Giovanni Rossi

Bayesian cluster analysis: Point estimation and credible balls

Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to classical algorithms which return a single clustering solution, Bayesian nonparametric models provide a posterior over…

Methodology · Statistics 2019-02-11 Sara Wade , Zoubin Ghahramani

Simple and Scalable Sparse k-means Clustering via Feature Ranking

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters.…

Machine Learning · Statistics 2020-10-23 Zhiyue Zhang , Kenneth Lange , Jason Xu

Fully Bayesian Estimation under Dependent and Informative Cluster Sampling

Survey data are often collected under multistage sampling designs where units are binned to clusters that are sampled in a first stage. The unit-indexed population variables of interest are typically dependent within cluster. We propose a…

Methodology · Statistics 2021-08-26 Luis G. Leon-Novelo , Terrance D. Savitsky

Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model

Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to…

Applications · Statistics 2018-06-18 Daniel Andrade , Akiko Takeda , Kenji Fukumizu