Related papers: Sampling in Dirichlet Process Mixture Models for C…

Dirichlet process mixture models for non-stationary data streams

In recent years, we have seen a handful of work on inference algorithms over non-stationary data streams. Given their flexibility, Bayesian non-parametric models are a good candidate for these scenarios. However, reliable streaming…

Machine Learning · Statistics 2022-10-14 Ioar Casado , Aritz Pérez

Streaming Variational Inference for Bayesian Nonparametric Mixture Models

In theory, Bayesian nonparametric (BNP) models are well suited to streaming data scenarios due to their ability to adapt model complexity with the observed data. Unfortunately, such benefits have not been fully realized in practice;…

Machine Learning · Statistics 2015-04-22 Alex Tank , Nicholas J. Foti , Emily B. Fox

Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data

We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they…

Machine Learning · Statistics 2017-09-20 Ruohui Wang , Dahua Lin

Parallel Clustering of Single Cell Transcriptomic Data with Split-Merge Sampling on Dirichlet Process Mixtures

Motivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to…

Machine Learning · Computer Science 2018-12-27 Tiehang Duan , José P. Pinto , Xiaohui Xie

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a low-variance…

Machine Learning · Computer Science 2013-11-04 Trevor Campbell , Miao Liu , Brian Kulis , Jonathan P. How , Lawrence Carin

A Clustering-based Framework for Classifying Data Streams

The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches…

Machine Learning · Computer Science 2021-06-23 Xuyang Yan , Abdollah Homaifar , Mrinmoy Sarkar , Abenezer Girma , Edward Tunstel

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

In the realm of unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet Process Mixture Model (DPMM), provide a principled approach for adapting the complexity of the model to the data. Such models are…

Machine Learning · Computer Science 2022-04-20 Or Dinari , Raz Zamir , John W. Fisher , Oren Freifeld

Online Clustering by Penalized Weighted GMM

With the dawn of the Big Data era, data sets are growing rapidly. Data is streaming from everywhere - from cameras, mobile phones, cars, and other electronic devices. Clustering streaming data is a very challenging problem. Unlike the…

Machine Learning · Computer Science 2019-02-08 Shlomo Bugdary , Shay Maymon

DPGIIL: Dirichlet Process-Deep Generative Model-Integrated Incremental Learning for Clustering in Transmissibility-based Online Structural Anomaly Detection

Clustering based on vibration responses, such as transmissibility functions (TFs), is promising in structural anomaly detection. However, most existing methods struggle to determine the optimal cluster number, handle high-dimensional…

Machine Learning · Computer Science 2025-10-21 Lin-Feng Mei , Wang-Ji Yan

Data Stream Clustering: A Review

Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for…

Machine Learning · Computer Science 2020-07-22 Alaettin Zubaroğlu , Volkan Atalay

Clustering Stream Data by Exploring the Evolution of Density Mountain

Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batch-mode clustering, there are two key challenges in stream clustering: (i) Given that input data are changing continuously,…

Databases · Computer Science 2017-10-04 Shufeng Gong , Yanfeng Zhang , Ge Yu

ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures

The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference…

Machine Learning · Statistics 2013-04-09 Dan Lovell , Jonathan Malmaud , Ryan P. Adams , Vikash K. Mansinghka

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed…

Machine Learning · Statistics 2015-09-15 Theodoros Tsiligkaridis , Keith W. Forsythe

Stream Clustering using Probabilistic Data Structures

Most density based stream clustering algorithms separate the clustering process into an online and offline component. Exact summarized statistics are being employed for defining micro-clusters or grid cells during the online stage followed…

Databases · Computer Science 2016-12-09 Andrei Sorin Sabau

Clustering consistency with Dirichlet process mixtures

Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size…

Statistics Theory · Mathematics 2022-11-29 Filippo Ascolani , Antonio Lijoi , Giovanni Rebaudo , Giacomo Zanella

Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data

Flow cytometry is a high-throughput technology used to quantify multiple surface and intracellular markers at the level of a single cell. This enables to identify cell sub-types, and to determine their relative proportions. Improvements of…

Machine Learning · Statistics 2022-11-10 Boris P. Hejblum , Chariff Alkhassim , Raphael Gottardo , François Caron , Rodolphe Thiébaut

Streaming Inference for Infinite Non-Stationary Clustering

Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents. Here, we attack learning under all three conditions…

Machine Learning · Computer Science 2023-05-23 Rylan Schaeffer , Gabrielle Kaili-May Liu , Yilun Du , Scott Linderman , Ila Rani Fiete

DPMM-CFL: Clustered Federated Learning via Dirichlet Process Mixture Model Nonparametric Clustering

Clustered Federated Learning (CFL) improves performance under non-IID client heterogeneity by clustering clients and training one model per cluster, thereby balancing between a global model and fully personalized models. However, most CFL…

Machine Learning · Computer Science 2026-01-30 Mariona Jaramillo-Civill , Peng Wu , Pau Closas

Clustering Categorical Data Streams

The data stream model has been defined for new classes of applications involving massive data being generated at a fast pace. Web click stream analysis and detection of network intrusions are two examples. Cluster analysis on data streams…

Databases · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng , Joshua Zhexue Huang

Incremental Gaussian Mixture Clustering for Data Streams

The problem of analyzing data streams of very large volumes is important and is very desirable for many application domains. In this paper we present and demonstrate effective working of an algorithm to find clusters and anomalous data…

Machine Learning · Computer Science 2025-03-25 Aniket Bhanderi , Raj Bhatnagar