Related papers: Minimum Message Length Clustering Using Gibbs Samp…

The Informativeness of K -Means for Learning Mixture Models

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…

Machine Learning · Statistics 2022-08-26 Zhaoqiang Liu , Vincent Y. F. Tan

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Mixture modelling involves explaining some observed evidence using a combination of probability distributions. The crux of the problem is the inference of an optimal number of mixture components and their corresponding parameters. This…

Machine Learning · Computer Science 2015-03-02 Parthan Kasarapu , Lloyd Allison

Near-Optimal Detection in MIMO Systems using Gibbs Sampling

In this paper we study a Markov Chain Monte Carlo (MCMC) Gibbs sampler for solving the integer least-squares problem. In digital communication the problem is equivalent to performing Maximum Likelihood (ML) detection in Multiple-Input…

Information Theory · Computer Science 2009-10-09 Morten Hansen , Babak Hassibi , Alexandros G. Dimakis , Weiyu Xu

Identifying the number of clusters in discrete mixture models

Research on cluster analysis for categorical data continues to develop, with new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. In this paper, we propose a…

Methodology · Statistics 2014-09-29 Cláudia Silvestre , Margarida G. M. S. Cardoso , Mário A. T. Figueiredo

A Novel Algorithm for Clustering of Data on the Unit Sphere via Mixture Models

A new maximum approximate likelihood (ML) estimation algorithm for the mixture of Kent distribution is proposed. The new algorithm is constructed via the BSLM (block successive lower-bound maximization) framework and incorporates manifold…

Computation · Statistics 2017-09-15 Hien D. Nguyen

Modelling of directional data using Kent distributions

The modelling of data on a spherical surface requires the consideration of directional probability distributions. To model asymmetrically distributed data on a three-dimensional sphere, Kent distributions are often used. The moment…

Machine Learning · Computer Science 2015-06-29 Parthan Kasarapu

Machine Learning for Genomic Data

This report explores the application of machine learning techniques on short timeseries gene expression data. Although standard machine learning algorithms work well on longer time-series', they often fail to find meaningful insights from…

Genomics · Quantitative Biology 2021-11-17 Akankshita Dash

Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model

In this paper, we study the problem of learning multi-dimensional Gaussian Mixture Models (GMMs), with a specific focus on model order selection and efficient mixing distribution estimation. We first establish an information-theoretic lower…

Machine Learning · Statistics 2026-03-23 Xinyu Liu , Hai Zhang

Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters k has to be given a priori. To solve these two issues, a multi-prototypes…

Machine Learning · Computer Science 2023-02-15 Dong Li , Shuisheng Zhou , Tieyong Zeng , Raymond H. Chan

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For…

Machine Learning · Computer Science 2012-06-15 Brian Kulis , Michael I. Jordan

Hard-Clustering with Gaussian Mixture Models

Training the parameters of statistical models to describe a given data set is a central task in the field of data mining and machine learning. A very popular and powerful way of parameter estimation is the method of maximum likelihood…

Machine Learning · Computer Science 2016-03-22 Johannes Blömer , Sascha Brauer , Kathrin Bujna

Mixtures of Bivariate von Mises Distributions with Applications to Modelling of Protein Dihedral Angles

The modelling of empirically observed data is commonly done using mixtures of probability distributions. In order to model angular data, directional probability distributions such as the bivariate von Mises (BVM) is typically used. The…

Machine Learning · Statistics 2016-09-27 Parthan Kasarapu

A discomfort-informed adaptive Gibbs sampler for finite mixture models

Finite mixture models are frequently used to uncover latent structures in high-dimensional datasets (e.g.\ identifying clusters of patients in electronic health records). The inference of such structures can be performed in a Bayesian…

Methodology · Statistics 2025-12-02 Davide Fabbrico , Andi Q. Wang , Sebastiano Grazzi , Alice Corbella , Gareth O. Roberts , Sylvia Richardson , Filippo Pagani , Paul D. W. Kirk

Accelerated Markov Chain Monte Carlo Using Adaptive Weighting Scheme

Gibbs sampling is one of the most commonly used Markov Chain Monte Carlo (MCMC) algorithms due to its simplicity and efficiency. It cycles through the latent variables, sampling each one from its distribution conditional on the current…

Machine Learning · Computer Science 2024-08-26 Yanbo Wang , Wenyu Chen , Shimin Shan

On particle Gibbs sampling

The particle Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm to sample from the full posterior distribution of a state-space model. It does so by executing Gibbs sampling steps on an extended target distribution defined on the…

Computation · Statistics 2015-07-29 Nicolas Chopin , Sumeetpal S. Singh

$K$-Means and Gaussian Mixture Modeling with a Separation Constraint

We consider the problem of clustering with $K$-means and Gaussian mixture models with a constraint on the separation between the centers in the context of real-valued data. We first propose a dynamic programming approach to solving the…

Computation · Statistics 2023-01-24 He Jiang , Ery Arias-Castro

Ten Steps of EM Suffice for Mixtures of Two Gaussians

The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means…

Machine Learning · Statistics 2017-06-06 Constantinos Daskalakis , Christos Tzamos , Manolis Zampetakis

Rethinking LLM Ensembling from the Perspective of Mixture Models

Model ensembling is a well-established technique for improving the performance of machine learning models. Conventionally, this involves averaging the output distributions of multiple models and selecting the most probable label. This idea…

Machine Learning · Computer Science 2026-05-26 Jiale Fu , Yuchu Jiang , Peijun Wu , Chonghan Liu , Joey Tianyi Zhou , Xu Yang

An Observation on Lloyd's k-Means Algorithm in High Dimensions

Clustering and estimating cluster means are core problems in statistics and machine learning, with k-means and Expectation Maximization (EM) being two widely used algorithms. In this work, we provide a theoretical explanation for the…

Machine Learning · Statistics 2025-06-19 David Silva-Sánchez , Roy R. Lederman

HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering

Minimum sum-of-squares clustering (MSSC) is a widely used clustering model, of which the popular K-means algorithm constitutes a local minimizer. It is well known that the solutions of K-means can be arbitrarily distant from the true MSSC…

Machine Learning · Computer Science 2018-12-21 Daniel Gribel , Thibaut Vidal