Related papers: A sequential algorithm for fast fitting of Dirichl…

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang and Dunson,…

Methodology · Statistics 2018-10-15 Oliver M. Crook , Laurent Gatto , Paul D. W. Kirk

Bayesian Mixture Models for Frequent Itemset Discovery

In binary-transaction data-mining, traditional frequent itemset mining often produces results which are not straightforward to interpret. To overcome this problem, probability models are often used to produce more compact and conclusive…

Machine Learning · Computer Science 2012-09-27 Ruefei He , Jonathan Shapiro

Fast search for Dirichlet process mixture models

Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate…

Machine Learning · Computer Science 2009-07-13 Hal Daumé

Variational approximation for mixtures of linear mixed models

Mixtures of linear mixed models (MLMMs) are useful for clustering grouped data and can be estimated by likelihood maximization through the EM algorithm. The conventional approach to determining a suitable number of components is to compare…

Applications · Statistics 2014-05-26 Siew Li Tan , David J. Nott

Dirichlet Process Mixtures of Generalized Mallows Models

We present a Dirichlet process mixture model over discrete incomplete rankings and study two Gibbs sampling inference techniques for estimating posterior clusterings. The first approach uses a slice sampling subcomponent for estimating…

Machine Learning · Computer Science 2012-03-19 Marina Meila , Harr Chen

Mixed data Deep Gaussian Mixture Model: A clustering model for mixed datasets

Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the…

Machine Learning · Computer Science 2022-05-10 Robin Fuchs , Denys Pommeret , Cinzia Viroli

MUSS: Multilevel Subset Selection for Relevance and Diversity

The problem of relevant and diverse subset selection has a wide range of applications, including recommender systems and retrieval-augmented generation (RAG). For example, in recommender systems, one is interested in selecting relevant…

Machine Learning · Computer Science 2026-03-10 Vu Nguyen , Andrey Kan

Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models in Federated Learning

Dirichlet Process Mixture Models (DPMMs) are widely used to address clustering problems. Their main advantage lies in their ability to automatically estimate the number of clusters during the inference process through the Bayesian…

Machine Learning · Statistics 2023-12-19 Reda Khoufache , Mustapha Lebbah , Hanene Azzag , Etienne Goffinet , Djamel Bouchaffra

Distributed Weighted Matching via Randomized Composable Coresets

Maximum weight matching is one of the most fundamental combinatorial optimization problems with a wide range of applications in data mining and bioinformatics. Developing distributed weighted matching algorithms is challenging due to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-06 Sepehr Assadi , MohammadHossein Bateni , Vahab Mirrokni

ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures

The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference…

Machine Learning · Statistics 2013-04-09 Dan Lovell , Jonathan Malmaud , Ryan P. Adams , Vikash K. Mansinghka

Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data

Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. However, for cluster analysis, using a traditional finite mixture model framework,…

Methodology · Statistics 2020-05-13 Yuan Fang , Dimitris Karlis , Sanjeena Subedi

The Greedy Dirichlet Process Filter - An Online Clustering Multi-Target Tracker

Reliable collision avoidance is one of the main requirements for autonomous driving. Hence, it is important to correctly estimate the states of an unknown number of static and dynamic objects in real-time. Here, data association is a major…

Computer Vision and Pattern Recognition · Computer Science 2019-03-11 Benjamin Naujoks , Patrick Burger , Hans-Joachim Wuensche

Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize

Scalable algorithms of posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local…

Machine Learning · Computer Science 2025-02-25 Kart-Leong Lim , Xudong Jiang

Updating Variational Bayes: Fast sequential posterior inference

Variational Bayesian (VB) methods produce posterior inference in a time frame considerably smaller than traditional Markov Chain Monte Carlo approaches. Although the VB posterior is an approximation, it has been shown to produce good…

Computation · Statistics 2019-08-02 Nathaniel Tomasetti , Catherine S. Forbes , Anastasios Panagiotelis

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed…

Machine Learning · Statistics 2015-09-15 Theodoros Tsiligkaridis , Keith W. Forsythe

Simple approximate MAP Inference for Dirichlet processes

The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as…

Machine Learning · Statistics 2014-11-05 Yordan P. Raykov , Alexis Boukouvalas , Max A. Little

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

In the realm of unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet Process Mixture Model (DPMM), provide a principled approach for adapting the complexity of the model to the data. Such models are…

Machine Learning · Computer Science 2022-04-20 Or Dinari , Raz Zamir , John W. Fisher , Oren Freifeld

A Random Finite Set Model for Data Clustering

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a…

Machine Learning · Statistics 2017-03-16 Dinh Phung , Ba-Ngu Bo

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a low-variance…

Machine Learning · Computer Science 2013-11-04 Trevor Campbell , Miao Liu , Brian Kulis , Jonathan P. How , Lawrence Carin

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is…

Machine Learning · Computer Science 2025-04-04 Maximilian Böther , Abraham Sebastian , Pranjal Awasthi , Ana Klimovic , Srikumar Ramalingam