Related papers: Sampling in Dirichlet Process Mixture Models for C…
In recent years, we have seen a handful of work on inference algorithms over non-stationary data streams. Given their flexibility, Bayesian non-parametric models are a good candidate for these scenarios. However, reliable streaming…
In theory, Bayesian nonparametric (BNP) models are well suited to streaming data scenarios due to their ability to adapt model complexity with the observed data. Unfortunately, such benefits have not been fully realized in practice;…
We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they…
Motivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to…
This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a low-variance…
The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches…
In the realm of unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet Process Mixture Model (DPMM), provide a principled approach for adapting the complexity of the model to the data. Such models are…
With the dawn of the Big Data era, data sets are growing rapidly. Data is streaming from everywhere - from cameras, mobile phones, cars, and other electronic devices. Clustering streaming data is a very challenging problem. Unlike the…
Clustering based on vibration responses, such as transmissibility functions (TFs), is promising in structural anomaly detection. However, most existing methods struggle to determine the optimal cluster number, handle high-dimensional…
Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for…
Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batch-mode clustering, there are two key challenges in stream clustering: (i) Given that input data are changing continuously,…
The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference…
We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed…
Most density based stream clustering algorithms separate the clustering process into an online and offline component. Exact summarized statistics are being employed for defining micro-clusters or grid cells during the online stage followed…
Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size…
Flow cytometry is a high-throughput technology used to quantify multiple surface and intracellular markers at the level of a single cell. This enables to identify cell sub-types, and to determine their relative proportions. Improvements of…
Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents. Here, we attack learning under all three conditions…
Clustered Federated Learning (CFL) improves performance under non-IID client heterogeneity by clustering clients and training one model per cluster, thereby balancing between a global model and fully personalized models. However, most CFL…
The data stream model has been defined for new classes of applications involving massive data being generated at a fast pace. Web click stream analysis and detection of network intrusions are two examples. Cluster analysis on data streams…
The problem of analyzing data streams of very large volumes is important and is very desirable for many application domains. In this paper we present and demonstrate effective working of an algorithm to find clusters and anomalous data…