Related papers: Massive parallelization boosts big Bayesian multid…
Bayesian multidimensional scaling (BMDS) is a probabilistic dimension reduction tool that allows one to model and visualize data consisting of dissimilarities between pairs of objects. Although BMDS has proven useful within, e.g., Bayesian…
Multidimensional scaling (MDS) is widely used to reconstruct a low-dimensional representation of high-dimensional data while preserving pairwise distances. However, Bayesian MDS approaches based on Markov chain Monte Carlo (MCMC) face…
Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantification of predictive uncertainty is…
Markov chain Monte Carlo (MCMC) is the predominant tool used in Bayesian parameter estimation for hierarchical models. When the model expands due to an increasing number of hierarchical levels, number of groups at a particular level, or…
In the era of Big Data, scalable and accurate clustering algorithms for high-dimensional data are essential. We present new Bayesian Distance Clustering (BDC) models and inference algorithms with improved scalability while maintaining the…
We introduce a class of scalable Bayesian hierarchical models for the analysis of massive geostatistical datasets. The underlying idea combines ideas on high-dimensional geostatistics by partitioning the spatial domain and modeling the…
Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this…
Nonstationary non-Gaussian spatial data are common in many disciplines, including climate science, ecology, epidemiology, and social sciences. Examples include count data on disease incidence and binary satellite data on cloud mask…
We introduce a novel and scalable Bayesian framework for multivariate-density-density regression (DDR), designed to model relationships between multivariate distributions. Our approach addresses the critical issue of distributions residing…
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or…
We present a set of algorithms implementing multidimensional scaling (MDS) for large data sets. MDS is a family of dimensionality reduction techniques using a $n \times n$ distance matrix as input, where $n$ is the number of individuals,…
Although Bayesian density estimation using discrete mixtures has good performance in modest dimensions, there is a lack of statistical and computational scalability to high-dimensional multivariate cases. To combat the curse of…
Multidimensional scaling (MDS) is a dimensionality reduction tool used for information analysis, data visualization and manifold learning. Most MDS procedures embed data points in low-dimensional Euclidean (flat) domains, such that…
Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for…
Viral mutations pose significant threats to public health by increasing infectivity, strengthening vaccine resistance, and altering disease severity. To track these evolving patterns, agencies like the CDC annually evaluate thousands of…
Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty…
Multidimensional Scaling (MDS) is a classic technique that seeks vectorial representations for data points, given the pairwise distances between them. However, in recent years, data are usually collected from diverse sources or have…
With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial datasets. This has generated substantial interest over the last decade, already…
Large-scale observational health databases are increasingly popular for conducting comparative effectiveness and safety studies of medical products. However, increasing number of patients poses computational challenges when fitting survival…
Multivariate Gaussian processes (GPs) offer a powerful probabilistic framework to represent complex interdependent phenomena. They pose, however, significant computational challenges in high-dimensional settings, which frequently arise in…