Related papers: Directionally Dependent Multi-View Clustering Usin…

Graphical model-based clustering of categorical data

Clustering multivariate data is a pervasive task in many applied problems, particularly in social studies and life science. Model-based approaches to clustering rely on mixture models, where each mixture component corresponds to the kernel…

Methodology · Statistics 2026-01-22 Laura Ferrini , Federico Castelletti

Copula Mixture Model for Dependency-seeking Clustering

We introduce a copula mixture model to perform dependency-seeking clustering when co-occurring samples from different data sources are available. The model takes advantage of the great flexibility offered by the copulas framework to extend…

Methodology · Statistics 2012-07-03 Melanie Rey , Volker Roth

Cross-Modal Alignment via Variational Copula Modelling

Various data modalities are common in real-world applications (e.g., electronic health records, medical images and clinical notes in healthcare). It is essential to develop multimodal learning methods to aggregate various information from…

Machine Learning · Computer Science 2025-11-06 Feng Wu , Tsai Hor Chan , Fuying Wang , Guosheng Yin , Lequan Yu

An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes

Due to the complexity of cancer, clustering algorithms have been used to disentangle the observed heterogeneity and identify cancer subtypes that can be treated specifically. While kernel based clustering approaches allow the use of more…

Machine Learning · Statistics 2018-11-21 Nora K. Speicher , Nico Pfeifer

Model-based clustering using copulas with applications

The majority of model-based clustering techniques is based on multivariate Normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas…

Methodology · Statistics 2018-02-16 Ioannis Kosmidis , Dimitris Karlis

Spatial clustering of array CGH features in combination with hierarchical multiple testing

We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering:…

Applications · Statistics 2010-12-21 Kyung In Kim , Etienne Roquain , Mark Van De Wiel

Bayesian Copula Directional Dependence for causal inference on gene expression data

Modelling and understanding directional gene networks is a major challenge in biology as they play an important role in the architecture and function of genetic systems. Copula Directional Dependence (CDD) can measure the directed…

Methodology · Statistics 2022-03-11 Vasiliki Vamvaka , Clara Grazian

Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine

Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide…

Methodology · Statistics 2023-03-02 Paul D. W. Kirk , Filippo Pagani , Sylvia Richardson

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

We propose a novel method for multiple clustering that assumes a co-clustering structure (partitions in both rows and columns of the data matrix) in each view. The new method is applicable to high-dimensional data. It is based on a…

Machine Learning · Statistics 2019-07-03 Tomoki Tokuda , Junichiro Yoshimoto , Yu Shimizu , Shigeru Toki , Go Okada , Masahiro Takamura , Tetsuya Yamamoto , Shinpei Yoshimura , Yasumasa Okamoto , Shigeto Yamawaki , Kenji Doya

A new correlation clustering method for cancer mutation analysis

Cancer genomes exhibit a large number of different alterations that affect many genes in a diverse manner. It is widely believed that these alterations follow combinatorial patterns that have a strong connection with the underlying…

Machine Learning · Computer Science 2016-01-26 Jack P. Hou , Amin Emad , Gregory J. Puleo , Jian Ma , Olgica Milenkovic

Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification

Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast…

Machine Learning · Computer Science 2025-11-25 Agnideep Aich , Sameera Hewage , Md Monzur Murshed

Longitudinal Data Clustering with a Copula Kernel Mixture Model

Many common clustering methods cannot be used for clustering multivariate longitudinal data in cases where variables exhibit high autocorrelations. In this article, a copula kernel mixture model (CKMM) is proposed for clustering data of…

Methodology · Statistics 2025-06-23 Xi Zhang , Orla A. Murphy , Paul D. McNicholas

Structure-guided Deep Multi-View Clustering

Deep multi-view clustering seeks to utilize the abundant information from multiple views to improve clustering performance. However, most of the existing clustering methods often neglect to fully mine multi-view structural information and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Jinrong Cui , Xiaohuang Wu , Haitao Zhang , Chongjie Dong , Jie Wen

Vine copula mixture models and clustering for non-Gaussian data

The majority of finite mixture models suffer from not allowing asymmetric tail dependencies within components and not capturing non-elliptical clusters in clustering applications. Since vine copulas are very flexible in capturing these…

Methodology · Statistics 2021-09-09 Özge Sahin , Claudia Czado

Dimensionally Reduced Open-World Clustering: DROWCULA

Working with annotated data is the cornerstone of supervised learning. Nevertheless, providing labels to instances is a task that requires significant human effort. Several critical real-world applications make things more complicated…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Erencem Ozbey , Dimitrios I. Diochnos

Hybrid copula mixed models for combining case-control and cohort studies in meta-analysis of diagnostic tests

Copula mixed models for trivariate (or bivariate) meta-analysis of diagnostic test accuracy studies accounting (or not) for disease prevalence have been proposed in the biostatistics literature to synthesize information. However, many…

Methodology · Statistics 2018-07-12 Aristidis K. Nikoloulopoulos

A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology

We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of…

Statistics Theory · Mathematics 2019-09-13 Bérangère Bastien , Taha Boukhobza , Hélène Dumond , Anne Gégout-Petit , Aurélie Muller-Gueudin , Charlène Thiébaut

Testing the Homogeneity of Proportions for Correlated Bilateral Data via the Clayton Copula

Handling highly dependent data is crucial in clinical trials, particularly in fields related to ophthalmology. Incorrectly specifying the dependency structure can lead to biased inferences. Traditionally, models rely on three fixed…

Methodology · Statistics 2025-09-30 Shuyi Liang , Takeshi Emura , Chang-Xing Ma , Yijing Xin , Xin-Wei Huang

Bayesian Consensus Clustering

The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These…

Machine Learning · Statistics 2015-12-01 Eric F. Lock , David B. Dunson

Are Clusterings of Multiple Data Views Independent?

In the Pioneer 100 (P100) Wellness Project (Price and others, 2017), multiple types of data are collected on a single set of healthy participants at multiple timepoints in order to characterize and optimize wellness. One way to do this is…

Methodology · Statistics 2019-01-15 Lucy L. Gao , Jacob Bien , Daniela Witten