Related papers: Shared kernel Bayesian screening
Single-cell technologies offer insights into molecular feature distributions, but comparing them poses challenges. We propose a kernel-testing framework for non-linear cell-wise distribution comparison, analyzing gene expression and…
In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where…
The Galleri (R) (GRAIL) multi-cancer early detection test measures circulating tumour DNA (ctDNA) to predict the presence of more than 50 different cancers, from a blood test. If sensitivity of the test to detect early-stage cancers is…
We present a Bayesian mixture model for estimating the joint distribution of mixed ordinal, nominal, and continuous data conditional on a set of fixed variables. The model uses multivariate normal and categorical mixture kernels for the…
Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is…
We propose a Bayesian test of normality for univariate or multivariate data against alternative nonparametric models characterized by Dirichlet process mixture distributions. The alternative models are based on the principles of embedding…
The analysis of large-scale datasets, especially in biomedical contexts, frequently involves a principled screening of multiple hypotheses. The celebrated two-group model jointly models the distribution of the test statistics with mixtures…
Next-generation sequencing technologies now constitute a method of choice to measure gene expression. Data to analyze are read counts, commonly modeled using Negative Binomial distributions. A relevant issue associated with this…
This paper introduces the kernel mixture network, a new method for nonparametric estimation of conditional probability densities using neural networks. We model arbitrarily complex conditional densities as linear combinations of a family of…
Kernel methods are one of the mainstays of machine learning, but the problem of kernel learning remains challenging, with only a few heuristics and very little theory. This is of particular importance in methods based on estimation of…
We propose novel kernel-based tests for assessing the equivalence between distributions. Traditional goodness-of-fit testing is inappropriate for concluding the absence of distributional differences, because failure to reject the null…
We provide a distribution-free test that can be used to determine whether any two joint distributions $p$ and $q$ are statistically different by inspection of a large enough set of samples. Following recent efforts from Long et al. [1], we…
High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with…
In this paper, we propose a test for the equality of multiple distributions based on kernel mean embeddings. Our framework provides a flexible way to handle multivariate or even high-dimensional data by virtue of kernel methods and allows…
Discrete mixture models are one of the most successful approaches for density estimation. Under a Bayesian nonparametric framework, Dirichlet process location-scale mixture of Gaussian kernels is the golden standard, both having nice…
Applying machine learning to biological sequences - DNA, RNA and protein - has enormous potential to advance human health, environmental sustainability, and fundamental biological understanding. However, many existing machine learning…
To improve the predictability of complex computational models in the experimentally-unknown domains, we propose a Bayesian statistical machine learning framework utilizing the Dirichlet distribution that combines results of several…
Test equating using covariates may be applied to provide comparable scores from multiple test forms when no anchor items are available. However, its performance may be compromised if some of the covariates themselves are measured using…
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…
Marginalising over families of Gaussian Process kernels produces flexible model classes with well-calibrated uncertainty estimates. Existing approaches require likelihood evaluations of many kernels, rendering them prohibitively expensive…