Related papers: Fusing heterogeneous data sets

Testing Homogeneity: The Trouble with Sparse Functional Data

Testing the homogeneity between two samples of functional data is an important task. While this is feasible for intensely measured functional data, we explain why it is challenging for sparsely measured functional data and show what can be…

Methodology · Statistics 2022-07-05 Changbo Zhu , Jane-Ling Wang

Longitudinal Omics Data Analysis: A Review on Models, Algorithms, and Tools

Longitudinal omics data (LOD) analysis is essential for understanding the dynamics of biological processes and disease progression over time. This review explores various statistical and computational approaches for analyzing such data,…

Methodology · Statistics 2025-06-16 Ali R. Taheriyoun , Allen Ross , Abolfazl Safikhani , Damoon Soudbakhsh , Ali Rahnavard

How to learn from inconsistencies: Integrating molecular simulations with experimental data

Molecular simulations and biophysical experiments can be used to provide independent and complementary insights into the molecular origin of biological processes. A particularly useful strategy is to use molecular simulations as a modelling…

Chemical Physics · Physics 2019-12-10 Simone Orioli , Andreas Haahr Larsen , Sandro Bottaro , Kresten Lindorff-Larsen

Reverse enGENEering of regulatory networks from Big Data: a guide for a biologist

Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics…

Molecular Networks · Quantitative Biology 2014-11-04 Xiaoxi Dong , Anatoly Yambartsev , Stephen Ramsey , Lina Thomas , Natalia Shulzhenko , Andrey Morgun

Hierarchical Bayesian Data Fusion for Robotic Platform Navigation

Data fusion has become an active research topic in recent years. Growing computational performance has allowed the use of redundant sensors to measure a single phenomenon. While Bayesian fusion approaches are common in general applications,…

Robotics · Computer Science 2017-04-25 Andres F. Echeverri , Henry Medeiros , Ryan Walsh , Yevgeniy Reznichenko , Richard Povinelli

Investigating the heterogeneity of "study twins"

Meta-analyses are commonly performed based on random-effects models, while in certain cases one might also argue in favour of a common-effect model. One such case may be given by the example of two "study twins" that are performed according…

Methodology · Statistics 2024-09-04 Christian Röver , Tim Friede

Multiple Metric Learning for Structured Data

We address the problem of merging graph and feature-space information while learning a metric from structured data. Existing algorithms tackle the problem in an asymmetric way, by either extracting vectorized summaries of the graph…

Machine Learning · Computer Science 2020-02-17 Nicolo Colombo

Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data

In systems biomedicine, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multi-variable network-level responses. In multiparametric cytometry,…

Machine Learning · Statistics 2015-06-16 Saumyadipta Pyne , Kui Wang , Jonathan Irish , Pablo Tamayo , Marc-Danie Nazaire , Tarn Duong , Sharon Lee , Shu-Kay Ng , David Hafler , Ronald Levy , Garry Nolan , Jill Mesirov , Geoffrey J. McLachlan

Frequency-domain alignment of heterogeneous, multidimensional separations data through complex orthogonal Procrustes analysis

Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area since the peaks that represent chemical factors may drift…

Numerical Analysis · Mathematics 2025-02-19 Michael Sorochan Armstrong

On the use of information fusion techniques to improve information quality: Taxonomy, opportunities and challenges

The information fusion field has recently been attracting a lot of interest within the scientific community, as it provides, through the combination of different sources of heterogeneous information, a fuller and/or more precise…

Information Theory · Computer Science 2025-10-28 Raúl Gutiérrez , Víctor Rampérez , Horacio Paggi , Juan A. Lara , Javier Soriano

A fully data-driven method for estimating density level sets

Density level sets can be estimated using plug-in methods, excess mass algorithms or a hybrid of the two previous methodologies. The plug-in algorithms are based on replacing the unknown density by some nonparametric estimator, usually the…

Statistics Theory · Mathematics 2016-11-26 A. Rodríguez-Casal , P. Saavedra-Nieves

Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine

Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide…

Methodology · Statistics 2023-03-02 Paul D. W. Kirk , Filippo Pagani , Sylvia Richardson

Methods for Quantifying Dataset Similarity: a Review, Taxonomy and Comparison

Quantifying the similarity between datasets has widespread applications in statistics and machine learning. The performance of a predictive model on novel datasets, referred to as generalizability, depends on how similar the training and…

Methodology · Statistics 2025-06-18 Marieke Stolte , Franziska Kappenberg , Jörg Rahnenführer , Andrea Bommert

Analysing Fuzzy Sets Through Combining Measures of Similarity and Distance

Reasoning with fuzzy sets can be achieved through measures such as similarity and distance. However, these measures can often give misleading results when considered independently, for example giving the same value for two different pairs…

Artificial Intelligence · Computer Science 2014-09-04 Josie McCulloch , Christian Wagner , Uwe Aickelin

A Graphical Model for Fusing Diverse Microbiome Data

This paper develops a Bayesian graphical model for fusing disparate types of count data. The motivating application is the study of bacterial communities from diverse high dimensional features, in this case transcripts, collected from…

Methodology · Statistics 2022-12-29 Mehmet Aktukmak , Haonan Zhu , Marc G. Chevrette , Julia Nepper , Shruthi Magesh , Jo Handelsman , Alfred Hero

DAFTED: Decoupled Asymmetric Fusion of Tabular and Echocardiographic Data for Cardiac Hypertension Diagnosis

Multimodal data fusion is a key approach for enhancing diagnosis in medical applications. We propose an asymmetric fusion strategy starting from a primary modality and integrating secondary modalities by disentangling shared and…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Jérémie Stym-Popper , Nathan Painchaud , Clément Rambour , Pierre-Yves Courand , Nicolas Thome , Olivier Bernard

Bayesian nonparametric cross-study validation of prediction methods

We consider comparisons of statistical learning algorithms using multiple data sets, via leave-one-in cross-study validation: each of the algorithms is trained on one data set; the resulting model is then validated on each remaining data…

Applications · Statistics 2015-06-02 Lorenzo Trippa , Levi Waldron , Curtis Huttenhower , Giovanni Parmigiani

Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning

The complementary information found in different modalities of patient data can aid in more accurate modelling of a patient's disease state and a better understanding of the underlying biological processes of a disease. However, the…

Machine Learning · Computer Science 2025-04-18 Annette Spooner , Mohammad Karimi Moridani , Azadeh Safarchi , Salim Maher , Fatemeh Vafaee , Amany Zekry , Arcot Sowmya

Data integration in high dimension with multiple quantiles

This article deals with the analysis of high dimensional data that come from multiple sources (experiments) and thus have different possibly correlated responses, but share the same set of predictors. The measurements of the predictors may…

Methodology · Statistics 2020-07-01 Guorong Dai , Ursula U. Müller , Raymond J. Carroll

Data Transformation Strategies to Remove Heterogeneity

Data heterogeneity is a prevalent issue, stemming from various conflicting factors, making its utilization complex. This uncertainty, particularly resulting from disparities in data formats, frequently necessitates the involvement of…

Machine Learning · Computer Science 2025-07-18 Sangbong Yoo , Jaeyoung Lee , Chanyoung Yoon , Geonyeong Son , Hyein Hong , Seongbum Seo , Soobin Yim , Chanyoung Jung , Jungsoo Park , Misuk Kim , Yun Jang