Related papers: Bayesian Supervised Causal Clustering
The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These…
Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters…
Estimating heterogeneous treatment effects is critical in domains such as personalized medicine, resource allocation, and policy evaluation. A central challenge lies in identifying subpopulations that respond differently to interventions,…
Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide…
Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to…
Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to…
Clustering is a crucial task in various domains of knowledge, including medicine, epidemiology, genomics, environmental science, economics, and visual sciences, among others. Methodologies for inferring the number of clusters have often…
The discovery of disease subtypes is an essential step for developing precision medicine, and disease subtyping via omics data has become a popular approach. While promising, subtypes obtained from conventional approaches may not be…
We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild…
The paper presents a novel approach for unsupervised techniques in the field of clustering. A new method is proposed to enhance existing literature models using the proper Bayesian bootstrap to improve results in terms of robustness and…
Choosing appropriate hyperparameters for unsupervised clustering algorithms in an optimal way depending on the problem under study is a long standing challenge, which we tackle while adapting clustering algorithms for immune disorder…
Determining phenotypes of diseases can have considerable benefits for in-hospital patient care and to drug development. The structure of high dimensional data sets such as electronic health records are often represented through an embedding…
Understanding treatment effect heterogeneity is vital for scientific and policy research. However, identifying and evaluating heterogeneous treatment effects pose significant challenges due to the typically unknown subgroup structure.…
We present a novel framework for concomitant dimension reduction and clustering. This framework is based on a novel class of Bayesian clustering factor models. These models assume a factor model structure where the vectors of common factors…
We consider an extension of model-based clustering to the semi-supervised case, where some of the data are pre-labeled. We provide a derivation of the Bayesian Information Criterion (BIC) approximation to the Bayes factor in this setting.…
Size-constrained clustering (SCC) refers to the dual problem of using observations to determine latent cluster structure while at the same time assigning observations to the unknown clusters subject to an analyst defined constraint on…
Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be…
Clustering mixed-type data remains a major challenge in biomedical research to uncover clinically meaningful subgroups within heterogeneous patient populations. Most existing clustering methods impose restrictive assumptions like local…
A key question in causal inference analyses is how to find subgroups with elevated treatment effects. This paper takes a machine learning approach and introduces a generative model, Causal Rule Sets (CRS), for interpretable subgroup…
Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite…