Related papers: Analysis of a Gibbs sampler method for model based…
Cluster analysis of biological samples using gene expression measurements is a common task which aids the discovery of heterogeneous biological sub-populations having distinct mRNA profiles. Several model-based clustering algorithms have…
The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are the key steps toward understanding transcription regulation. In addition to effective laboratory assays, various bi-clustering algorithms for…
Discrete mixture models provide a well-known basis for effective clustering algorithms, although technical challenges have limited their scope. In the context of gene-expression data analysis, a model is presented that mixes over a finite…
We present a novel framework for concomitant dimension reduction and clustering. This framework is based on a novel class of Bayesian clustering factor models. These models assume a factor model structure where the vectors of common factors…
Clustering is a popular data mining technique that aims to partition an input space into multiple homogeneous regions. There exist several clustering algorithms in the literature. The performance of a clustering algorithm depends on its…
To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting…
Clustering is one of the widely used data mining techniques for medical diagnosis. Clustering can be considered as the most important unsupervised learning technique. Most of the clustering methods group data based on distance and few…
Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or…
We present a new approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster,…
Finite mixture models are frequently used to uncover latent structures in high-dimensional datasets (e.g.\ identifying clusters of patients in electronic health records). The inference of such structures can be performed in a Bayesian…
Recent work on overfitting Bayesian mixtures of distributions offers a powerful framework for clustering multivariate data using a latent Gaussian model which resembles the factor analysis model. The flexibility provided by overfitting…
Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads",…
Traditional clustering methods are limited when dealing with huge and heterogeneous groups of gene expression data, which motivates the development of bi-clustering methods. Bi-clustering methods are used to mine bi-clusters whose subsets…
Non-Gaussian mixture models are gaining increasing attention for mixture model-based clustering particularly when dealing with data that exhibit features such as skewness and heavy tails. Here, such a mixture distribution is presented,…
Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to…
We study the convergence properties of the Gibbs Sampler in the context of posterior distributions arising from Bayesian analysis of conditionally Gaussian hierarchical models. We develop a multigrid approach to derive analytic expressions…
We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound for parameter estimation is established, and we show that a constrained maximum…
We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering:…
We propose and develop a Bayesian plaid model for biclustering that accounts for the prior dependency between genes (and/or conditions) through a stochastic relational graph. This work is motivated by the need for improved understanding of…
We present a novel coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant…