Related papers: Scalable Extreme Deconvolution
Density estimation is a fundamental problem that arises in many areas of astronomy, with applications ranging from selecting quasars using color distributions to characterizing stellar abundances. Astronomical observations are inevitably…
We present initial results on the use of Mixture Models for density estimation in large astronomical databases. We provide herein both the theoretical and experimental background for using a mixture model of Gaussians based on the…
We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation--Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point…
In this paper, we outline the use of Mixture Models in density estimation of large astronomical databases. This method of density estimation has been known in Statistics for some time but has not been implemented because of the large…
We propose an adaptive learning procedure to learn patch-based image priors for image denoising. The new algorithm, called the Expectation-Maximization (EM) adaptation, takes a generic prior learned from a generic external database and…
We describe two new open source tools written in Python for performing extreme deconvolution Gaussian mixture modeling (XDGMM) and using a conditioned model to re-sample observed supernova and host galaxy populations. XDGMM is new program…
Data clustering has received a lot of attention and numerous methods, algorithms and software packages are available. Among these techniques, parametric finite-mixture models play a central role due to their interesting mathematical…
The maximum entropy method (MEM) is a well known deconvolution technique in radio-interferometry. This method solves a non-linear optimization problem with an entropy regularization term. Other heuristics such as CLEAN are faster but highly…
The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means…
The expectation-maximization (EM) algorithm is an iterative method for finding maximum likelihood estimates when data are incomplete or are treated as being incomplete. The EM algorithm and its variants are commonly used for parameter…
Gaussian mixture models (GMMs) are fundamental statistical tools for modeling heterogeneous data. Due to the nonconcavity of the likelihood function, the Expectation-Maximization (EM) algorithm is widely used for parameter estimation of…
We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate…
This paper represents a preliminary (pre-reviewing) version of a sublinear variational algorithm for isotropic Gaussian mixture models (GMMs). Further developments of the algorithm for GMMs with diagonal covariance matrices (instead of…
Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction…
Bayesian graphical models are a useful tool for understanding dependence relationships among many variables, particularly in situations with external prior information. In high-dimensional settings, the space of possible graphs becomes…
Probit models are useful for modeling correlated discrete responses in many disciplines, including consumer choice data in economics and marketing. However, the Gaussian latent variable feature of probit models coupled with identification…
Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing maximum likelihood estimate when dealing with Gaussian Mixture Model (GMM). When the sample size is smaller than the data dimension, this could lead…
The EM algorithm is one of the most popular algorithm for inference in latent data models. The original formulation of the EM algorithm does not scale to large data set, because the whole data set is required at each iteration of the…
Observations of exoplanet atmospheres in high resolution have the potential to resolve individual planetary absorption lines, despite the issues associated with ground-based observations. The removal of contaminating stellar and telluric…
Partially recorded data are frequently encountered in many applications and usually clustered by first removing incomplete cases or features with missing values, or by imputing missing values, followed by application of a clustering…