Related papers: Communication-Free Parallel Supervised Topic Model…
Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which…
Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We…
We consider parallel asynchronous Markov Chain Monte Carlo (MCMC) sampling for problems where we can leverage (stochastic) gradients to define continuous dynamics which explore the target distribution. We outline a solution strategy for…
Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric,…
Self-learning Monte Carlo (SLMC) methods are recently proposed to accelerate Markov chain Monte Carlo (MCMC) methods using a machine learning model. With latent generative models, SLMC methods realize efficient Monte Carlo updates with less…
Markov chain Monte Carlo (MCMC) is a powerful methodology for the approximation of posterior distributions. However, the iterative nature of MCMC does not naturally facilitate its use with modern highly parallel computation on HPC and cloud…
We propose a sequential Markov chain Monte Carlo (SMCMC) algorithm to sample from a sequence of probability distributions, corresponding to posterior distributions at different times in on-line applications. SMCMC proceeds as in usual MCMC…
Markov Chain Monte Carlo (MCMC) algorithms are essential tools in computational statistics for sampling from unnormalised probability distributions, but can be fragile when targeting high-dimensional, multimodal, or complex target…
Multimodal models ideally should generalize to unseen domains while remaining data-efficient to reduce annotation costs. To this end, we introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG), which aims…
As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text…
High-dimensional distributions, especially those with heavy tails, are notoriously difficult for off-the-shelf MCMC samplers: the combination of unbounded state spaces, diminishing gradient information, and local moves results in…
We develop a novel Markov chain Monte Carlo (MCMC) method that exploits a hierarchy of models of increasing complexity to efficiently generate samples from an unnormalized target distribution. Broadly, the method rewrites the Multilevel…
Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov…
As it has become common to use many computer cores in routine applications, finding good ways to parallelize popular algorithms has become increasingly important. In this paper, we present a parallelization scheme for Markov chain Monte…
For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD). PSGLD has very favourable scaling properties…
Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be exceedingly expensive when posterior evaluations invoke the evaluation of a computationally expensive model, such as a system of partial differential equations. In…
We propose a generic Markov Chain Monte Carlo (MCMC) algorithm to speed up computations for datasets with many observations. A key feature of our approach is the use of the highly efficient difference estimator from the survey sampling…
Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the…
Markov Chain Monte Carlo (MCMC) is a well-established family of algorithms primarily used in Bayesian statistics to sample from a target distribution when direct sampling is challenging. Existing work on Bayesian decision trees uses MCMC.…
We describe an embarrassingly parallel, anytime Monte Carlo method for likelihood-free models. The algorithm starts with the view that the stochasticity of the pseudo-samples generated by the simulator can be controlled externally by a…