Related papers: Sparse Bayesian Hierarchical Modeling of High-dime…

A Bayesian non-parametric method for clustering high-dimensional binary data

In many real life problems, objects are described by large number of binary features. For instance, documents are characterized by presence or absence of certain keywords; cancer patients are characterized by presence or absence of certain…

Applications · Statistics 2016-03-09 Tapesh Santra

ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures

The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference…

Machine Learning · Statistics 2013-04-09 Dan Lovell , Jonathan Malmaud , Ryan P. Adams , Vikash K. Mansinghka

Flexible Bayesian Nonparametric Product Mixtures for Multi-scale Functional Clustering

There is a rich literature on clustering functional data with applications to time-series modeling, trajectory data, and even spatio-temporal applications. However, existing methods routinely perform global clustering that enforces…

Methodology · Statistics 2024-12-16 Tsung-Hung Yao , Suprateek Kundu

A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed Dataset

The paper is motivated from clustering problem in high-throughput mixed datasets. Clustering of such datasets can provide much insight into biological associations. An open problem in this context is to simultaneously cluster…

Methodology · Statistics 2018-08-15 Chetkar Jha

Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer

There is a keen interest in characterizing variation in the microbiome across cancer patients, given increasing evidence of its important role in determining treatment outcomes. Here our goal is to discover subgroups of patients with…

Applications · Statistics 2022-12-06 Yushu Shi , Liangliang Zhang , Kim-Anh Do , Robert Jenq , Christine Peterson

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang and Dunson,…

Methodology · Statistics 2018-10-15 Oliver M. Crook , Laurent Gatto , Paul D. W. Kirk

Flexible clustering via hidden hierarchical Dirichlet priors

The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for…

Methodology · Statistics 2022-01-25 Antonio Lijoi , Igor Prünster , Giovanni Rebaudo

A Sparse Factor Model for Clustering High-Dimensional Longitudinal Data

Recent advances in engineering technologies have enabled the collection of a large number of longitudinal features. This wealth of information presents unique opportunities for researchers to investigate the complex nature of diseases and…

Methodology · Statistics 2023-11-27 Zihang Lu , Noirrit Kiran Chandra

Covariate-dependent hierarchical Dirichlet processes

Bayesian hierarchical modeling is a natural framework to effectively integrate data and borrow information across groups. In this paper, we address problems related to density estimation and identifying clusters across related groups, by…

Methodology · Statistics 2025-10-29 Huizi Zhang , Sara Wade , Natalia Bochkina

Clustering based on Mixtures of Sparse Gaussian Processes

Creating low dimensional representations of a high dimensional data set is an important component in many machine learning applications. How to cluster data using their low dimensional embedded space is still a challenging problem in…

Machine Learning · Computer Science 2023-03-27 Zahra Moslehi , Abdolreza Mirzaei , Mehran Safayani

A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is…

Machine Learning · Computer Science 2009-07-07 Hal Daumé , Daniel Marcu

Bayesian clustering of replicated time-course gene expression data with weak signals

To identify novel dynamic patterns of gene expression, we develop a statistical method to cluster noisy measurements of gene expression collected from multiple replicates at multiple time points, with an unknown number of clusters. We…

Applications · Statistics 2013-12-02 Audrey Qiuyan Fu , Steven Russell , Sarah J. Bray , Simon Tavaré

Global-Local Dirichlet Processes for Identifying Pan-Cancer Subpopulations Using Both Shared and Cancer-Specific Data

We consider the problem of clustering grouped data for which the observations may include group-specific variables in addition to the variables that are shared across groups. This type of data is common in cancer genomics where the…

Methodology · Statistics 2025-09-30 Arhit Chakrabarti , Yang Ni , Debdeep Pati , Bani K. Mallick

Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts

We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using…

Machine Learning · Computer Science 2014-01-30 Vu Nguyen , Dinh Phung , XuanLong Nguyen , Svetha Venkatesh , Hung Hai Bui

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Nonparametric Variable Selection, Clustering and Prediction for High-Dimensional Regression

The development of parsimonious models for reliable inference and prediction of responses in high-dimensional regression settings is often challenging due to relatively small sample sizes and the presence of complex interaction patterns…

Methodology · Statistics 2016-04-15 Subharup Guha , Veerabhadran Baladandayuthapani

An Empirical Bayes Approach for High Dimensional Classification

We propose an empirical Bayes estimator based on Dirichlet process mixture model for estimating the sparse normalized mean difference, which could be directly applied to the high dimensional linear classification. In theory, we build a…

Machine Learning · Statistics 2017-02-17 Yunbo Ouyang , Feng Liang

Fast search for Dirichlet process mixture models

Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate…

Machine Learning · Computer Science 2009-07-13 Hal Daumé

A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process

The hierarchical Dirichlet process (HDP) has become an important Bayesian nonparametric model for grouped data, such as document collections. The HDP is used to construct a flexible mixed-membership model where the number of components is…

Machine Learning · Statistics 2012-01-10 Chong Wang , David M. Blei

Bayesian Clustering of Transcription Factor Binding Motifs

Genes are often regulated in living cells by proteins called transcription factors (TFs) that bind directly to short segments of DNA in close proximity to specific genes. These binding sites have a conserved nucleotide appearance, which is…

Statistics Theory · Mathematics 2007-06-13 Shane T. Jensen , Jun S. Liu