Related papers: A Divide and Conquer Algorithm of Bayesian Density…
We study Bayesian estimation of finite mixture models in a general setup where the number of components is unknown and allowed to grow with the sample size. An assumption on growing number of components is a natural one as the degree of…
Divide-and-conquer Bayesian methods consist of three steps: dividing the data into smaller computationally manageable subsets, running a sampling algorithm in parallel on all the subsets, and combining parameter draws from all the subsets.…
Although Bayesian density estimation using discrete mixtures has good performance in modest dimensions, there is a lack of statistical and computational scalability to high-dimensional multivariate cases. To combat the curse of…
Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining…
Finite mixture of Gaussian distributions provide a flexible semi-parametric methodology for density estimation when the variables under investigation have no boundaries. However, in practical applications variables may be partially bounded…
Suppose $X_1,\dots, X_n$ is a random sample from a bounded and decreasing density $f_0$ on $[0,\infty)$. We are interested in estimating such $f_0$, with special interest in $f_0(0)$. This problem is encountered in various statistical…
In Bayesian inference for mixture models with an unknown number of components, a finite mixture model is usually employed that assumes prior distributions for mixing weights and the number of components. This model is called a mixture of…
Divide-and-conquer methods use large-sample approximations to provide frequentist guarantees when each block of data is both small enough to facilitate efficient computation and large enough to support approximately valid inferences. When…
We focus on Bayesian inverse problems with Gaussian likelihood, linear forward model, and priors that can be formulated as a Gaussian mixture. Such a mixture is expressed as an integral of Gaussian density functions weighted by a mixing…
Divide-and-conquer based methods for Bayesian inference provide a general approach for tractable posterior inference when the sample size is large. These methods divide the data into smaller subsets, sample from the posterior distribution…
We consider estimating the parameters of a Gaussian mixture density with a given number of components best representing a given set of weighted samples. We adopt a density interpretation of the samples by viewing them as a discrete Dirac…
While Bayesian inference provides a principled framework for reasoning under uncertainty, its widespread adoption is limited by the intractability of exact posterior computation, necessitating the use of approximate inference. However,…
Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering, with density estimation as a secondary goal (Wade and Ghahramani, 2018; Dahl et al.,…
Density estimation is an interdisciplinary topic at the intersection of statistics, theoretical computer science and machine learning. We review some old and new techniques for bounding the sample complexity of estimating densities of…
Recent advancements in solving Bayesian inverse problems have spotlighted denoising diffusion models (DDMs) as effective priors. Although these have great potential, DDM priors yield complex posterior distributions that are challenging to…
As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the…
We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound for parameter estimation is established, and we show that a constrained maximum…
Computer experiments are becoming increasingly important in scientific investigations. In the presence of uncertainty, analysts employ probabilistic sensitivity methods to identify the key-drivers of change in the quantities of interest.…
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another…
In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering.…