Related papers: Distributed Computation for Marginal Likelihood ba…
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another…
Bayesian model selection provides a powerful framework for objectively comparing models directly from observed data, without reference to ground truth data. However, Bayesian model selection requires the computation of the marginal…
Divide-and-conquer Bayesian methods consist of three steps: dividing the data into smaller computationally manageable subsets, running a sampling algorithm in parallel on all the subsets, and combining parameter draws from all the subsets.…
Inference of the marginal probability distribution is defined as the calculation of the probability of a subset of the variables and is relevant for handling missing data and hidden variables. While inference of the marginal probability…
Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In…
Estimating copulas with discrete marginal distributions is challenging, especially in high dimensions, because computing the likelihood contribution of each observation requires evaluating $2^{J}$ terms, with $J$ the number of discrete…
We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of…
We propose a novel Bayesian model selection technique on linear mixed-effects models to compare multiple treatments with a control. A fully Bayesian approach is implemented to estimate the marginal inclusion probabilities that provide a…
For Bayesian computation in big data contexts, the divide-and-conquer MCMC concept splits the whole data set into batches, runs MCMC algorithms separately over each batch to produce samples of parameters, and combines them to produce an…
It is common practice to use Laplace approximations to compute marginal likelihoods in Bayesian versions of generalised linear models (GLM). Marginal likelihoods combined with model priors are then used in different search algorithms to…
We develop sampling algorithms to fit Bayesian hierarchical models, the computational complexity of which scales linearly with the number of observations and the number of parameters in the model. We focus on crossed random effect and…
We propose and analyze a generalized splitting method to sample approximately from a distribution conditional on the occurrence of a rare event. This has important applications in a variety of contexts in operations research, engineering,…
This paper describes a Bayesian method for learning causal networks using samples that were selected in a non-random manner from a population of interest. Examples of data obtained by non-random sampling include convenience samples and…
We present a Bayesian data fusion method to approximate a posterior distribution from an ensemble of particle estimates that only have access to subsets of the data. Our approach relies on approximate probabilistic inference of model…
Monte Carlo algorithms, such as Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC), are routinely used for Bayesian inference in generalized linear models; however, these algorithms are prohibitively slow in massive data…
Bayesian nonparametric (BNP) models provide elegant methods for discovering underlying latent features within a data set, but inference in such models can be slow. We exploit the fact that completely random measures, which commonly used…
We develop a Bayesian approach for selecting the model which is the most supported by the data within a class of marginal models for categorical variables formulated through equality and/or inequality constraints on generalised logits…
In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering.…
Many inference problems involve inferring the number $N$ of components in some region, along with their properties $\{\mathbf{x}_i\}_{i=1}^N$, from a dataset $\mathcal{D}$. A common statistical example is finite mixture modelling. In the…
We propose a novel Bayesian inference framework for distributed differentially private linear regression. We consider a distributed setting where multiple parties hold parts of the data and share certain summary statistics of their portions…