Related papers: Fusing heterogeneous data sets
In this paper, we present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables. The proposed approach extends the Information Bottleneck principle to…
Cellular heterogeneity is an immanent property of biological systems that covers very different aspects of life ranging from genetic diversity to cell-to-cell variability driven by stochastic molecular interactions, and noise induced cell…
The rapid advancement of high-throughput sequencing and other assay technologies has resulted in the generation of large and complex multi-omics datasets, offering unprecedented opportunities for advancing precision medicine strategies.…
This article proposes a powerful scheme to monitor a large number of categorical data streams with heterogeneous parameters or nature. The data streams considered may be either nominal with a number of attribute levels or ordinal with some…
The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages multiple fine-tuned models aiming to improve performance on…
Quantitative evidence synthesis methods aim to combine data from multiple medical trials to infer relative effects of different interventions. A challenge arises when trials report continuous outcomes on different measurement scales. To…
Data comes in many forms. From a shallow perspective, they can be viewed as being either in structured (e.g., as a relation, as key-value pairs) or unstructured (e.g., text, image) formats. So far, machines have been fairly good at…
One of the major research questions regarding human microbiome studies is the feasibility of designing interventions that modulate the composition of the microbiome to promote health and cure disease. This requires extensive understanding…
Multiple sets of measurements on the same objects obtained from different platforms may reflect partially complementary information of the studied system. The integrative analysis of such data sets not only provides us with the opportunity…
Entity information network is used to describe structural relationships between entities. Taking advantage of its extension and heterogeneity, entity information network is more and more widely applied to relationship modeling. Recent…
In recent years, machine learning has demonstrated impressive capability in handling molecular science tasks. To support various molecular properties at scale, machine learning models are trained in the multi-task learning paradigm.…
Nowadays, journalism is facilitated by the existence of large amounts of digital data sources, including many Open Data ones. Such data sources are extremely heterogeneous, ranging from highly struc-tured (relational databases),…
This paper proposes a new approach to multi-sensor data fusion. It suggests that aggregation of data from multiple sensors can be done more efficiently when we consider information about sensors' different characteristics. Similar to most…
Many data sets contain an inherent multilevel structure, for example, because of repeated measurements of the same observational units. Taking this structure into account is critical for the accuracy and calibration of any statistical…
Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical…
Suppose that we are interested in the comparison of two independent categorical variables. Suppose also that the population is divided into subpopulations or groups. Notice that the distribution of the target variable may vary across…
Recently developed technologies to generate single-cell genomic data have made a revolutionary impact in the field of biology. Multi-omics assays offer even greater opportunities to understand cellular states and biological processes.…
We study stochastic particle systems made up of heterogeneous units. We introduce a general framework suitable to analytically study this kind of systems and apply it to two particular models of interest in economy and epidemiology. We show…
Modern TEM instrumentation can probe a wide range of structural, optical, and chemical properties with unprecedented resolution. However, each of these properties must be recorded in independent datasets using different detector modes with…
Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical…