Related papers: Fusing heterogeneous data sets
Data is a precious resource in today's society, and is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in…
We propose a novel framework for combining datasets via alignment of their intrinsic geometry. This alignment can be used to fuse data originating from disparate modalities, or to correct batch effects while preserving intrinsic data…
We present a statistical mechanics approach to the protein folding problem. We first review some of the basic properties of proteins, and introduce some physical models to describe their thermodynamics. These models rely on a random…
Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the…
Integrating data from multiple heterogeneous sources has become increasingly popular to achieve a large sample size and diverse study population. This paper reviews development in causal inference methods that combines multiple datasets…
We study the problem of parameter estimation for time-series possessing two, widely separated, characteristic time scales. The aim is to understand situations where it is desirable to fit a homogenized singlescale model to such multiscale…
This paper proposes a heterogenous density fusion approach to scalable multisensor multitarget tracking where the inter-connected sensors run different types of random finite set (RFS) filters according to their respective capacity and…
Every design choice will have different effects on different units. However traditional A/B tests are often underpowered to identify these heterogeneous effects. This is especially true when the set of unit-level attributes is…
Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data.…
Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensionality of metabolomics data challenges learning algorithms in terms of scalability and…
Multimodal single-cell technologies enable the simultaneous collection of diverse data types from individual cells, enhancing our understanding of cellular states. However, the integration of these datatypes and modeling the…
Data depth has been applied as a nonparametric measurement for ranking multivariate samples. In this paper, we focus on homogeneity tests to assess whether two multivariate samples are from the same distribution. There are many data…
Current efforts in the biomedical sciences and related interdisciplinary fields are focused on gaining a molecular understanding of health and disease, which is a problem of daunting complexity that spans many orders of magnitude in…
We consider fits to two or more datasets for which results from the sa me experiment share a common systematic uncertainty in addition to their individ ual statistical errors. This is important in extracting the maximum information from a…
Mathematical models come in many forms across biological applications. In the case of complex, spatial dynamics and pattern formation, stochastic models also face two main challenges: pattern data is largely qualitative, and model…
An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable…
1. Animal movement patterns contribute to our understanding of variation in breeding success and survival of individuals, and the implications for population dynamics. 2. Over time, sensor technology for measuring movement patterns has…
Fusing probabilistic information is a fundamental task in signal and data processing with relevance to many fields of technology and science. In this work, we investigate the fusion of multiple probability density functions (pdfs) of a…
As researchers collect increasingly large molecular data sets to reconstruct the Tree of Life, the heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. A class of phylogenetic…
The exposome recognizes that individuals are exposed simultaneously to a multitude of different environmental factors and takes a holistic approach to the discovery of etiological factors for disease. However, challenges arise when trying…