English
Related papers

Related papers: Fusing heterogeneous data sets

200 papers

Testing the homogeneity between two samples of functional data is an important task. While this is feasible for intensely measured functional data, we explain why it is challenging for sparsely measured functional data and show what can be…

Methodology · Statistics 2022-07-05 Changbo Zhu , Jane-Ling Wang

Longitudinal omics data (LOD) analysis is essential for understanding the dynamics of biological processes and disease progression over time. This review explores various statistical and computational approaches for analyzing such data,…

Methodology · Statistics 2025-06-16 Ali R. Taheriyoun , Allen Ross , Abolfazl Safikhani , Damoon Soudbakhsh , Ali Rahnavard

Molecular simulations and biophysical experiments can be used to provide independent and complementary insights into the molecular origin of biological processes. A particularly useful strategy is to use molecular simulations as a modelling…

Chemical Physics · Physics 2019-12-10 Simone Orioli , Andreas Haahr Larsen , Sandro Bottaro , Kresten Lindorff-Larsen

Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics…

Molecular Networks · Quantitative Biology 2014-11-04 Xiaoxi Dong , Anatoly Yambartsev , Stephen Ramsey , Lina Thomas , Natalia Shulzhenko , Andrey Morgun

Data fusion has become an active research topic in recent years. Growing computational performance has allowed the use of redundant sensors to measure a single phenomenon. While Bayesian fusion approaches are common in general applications,…

Robotics · Computer Science 2017-04-25 Andres F. Echeverri , Henry Medeiros , Ryan Walsh , Yevgeniy Reznichenko , Richard Povinelli

Meta-analyses are commonly performed based on random-effects models, while in certain cases one might also argue in favour of a common-effect model. One such case may be given by the example of two "study twins" that are performed according…

Methodology · Statistics 2024-09-04 Christian Röver , Tim Friede

We address the problem of merging graph and feature-space information while learning a metric from structured data. Existing algorithms tackle the problem in an asymmetric way, by either extracting vectorized summaries of the graph…

Machine Learning · Computer Science 2020-02-17 Nicolo Colombo

In systems biomedicine, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multi-variable network-level responses. In multiparametric cytometry,…

Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area since the peaks that represent chemical factors may drift…

Numerical Analysis · Mathematics 2025-02-19 Michael Sorochan Armstrong

The information fusion field has recently been attracting a lot of interest within the scientific community, as it provides, through the combination of different sources of heterogeneous information, a fuller and/or more precise…

Information Theory · Computer Science 2025-10-28 Raúl Gutiérrez , Víctor Rampérez , Horacio Paggi , Juan A. Lara , Javier Soriano

Density level sets can be estimated using plug-in methods, excess mass algorithms or a hybrid of the two previous methodologies. The plug-in algorithms are based on replacing the unknown density by some nonparametric estimator, usually the…

Statistics Theory · Mathematics 2016-11-26 A. Rodríguez-Casal , P. Saavedra-Nieves

Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide…

Methodology · Statistics 2023-03-02 Paul D. W. Kirk , Filippo Pagani , Sylvia Richardson

Quantifying the similarity between datasets has widespread applications in statistics and machine learning. The performance of a predictive model on novel datasets, referred to as generalizability, depends on how similar the training and…

Methodology · Statistics 2025-06-18 Marieke Stolte , Franziska Kappenberg , Jörg Rahnenführer , Andrea Bommert

Reasoning with fuzzy sets can be achieved through measures such as similarity and distance. However, these measures can often give misleading results when considered independently, for example giving the same value for two different pairs…

Artificial Intelligence · Computer Science 2014-09-04 Josie McCulloch , Christian Wagner , Uwe Aickelin

This paper develops a Bayesian graphical model for fusing disparate types of count data. The motivating application is the study of bacterial communities from diverse high dimensional features, in this case transcripts, collected from…

Multimodal data fusion is a key approach for enhancing diagnosis in medical applications. We propose an asymmetric fusion strategy starting from a primary modality and integrating secondary modalities by disentangling shared and…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Jérémie Stym-Popper , Nathan Painchaud , Clément Rambour , Pierre-Yves Courand , Nicolas Thome , Olivier Bernard

We consider comparisons of statistical learning algorithms using multiple data sets, via leave-one-in cross-study validation: each of the algorithms is trained on one data set; the resulting model is then validated on each remaining data…

Applications · Statistics 2015-06-02 Lorenzo Trippa , Levi Waldron , Curtis Huttenhower , Giovanni Parmigiani

The complementary information found in different modalities of patient data can aid in more accurate modelling of a patient's disease state and a better understanding of the underlying biological processes of a disease. However, the…

Machine Learning · Computer Science 2025-04-18 Annette Spooner , Mohammad Karimi Moridani , Azadeh Safarchi , Salim Maher , Fatemeh Vafaee , Amany Zekry , Arcot Sowmya

This article deals with the analysis of high dimensional data that come from multiple sources (experiments) and thus have different possibly correlated responses, but share the same set of predictors. The measurements of the predictors may…

Methodology · Statistics 2020-07-01 Guorong Dai , Ursula U. Müller , Raymond J. Carroll

Data heterogeneity is a prevalent issue, stemming from various conflicting factors, making its utilization complex. This uncertainty, particularly resulting from disparities in data formats, frequently necessitates the involvement of…