Related papers: On Estimation and Selection for Topic Models
Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, lacking in the literature is a formal…
In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in…
Likelihood based-learning of graphical models faces challenges of computational-complexity and robustness to model mis-specification. This paper studies methods that fit parameters directly to maximize a measure of the accuracy of predicted…
Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents. Existing models apply the likelihood-based estimation. In this paper, we present a general framework of…
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence…
Topic models have become popular tools for dimension reduction and exploratory analysis of text data which consists in observed frequencies of a vocabulary of $p$ words in $n$ documents, stored in a $p\times n$ matrix. The main premise is…
Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics. Following…
We propose a neural network based approach for learning topics from text and image datasets. The model makes no assumptions about the conditional distribution of the observed features given the latent topics. This allows us to perform topic…
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds…
We here introduce a novel classification approach adopted from the nonlinear model identification framework, which jointly addresses the feature selection and classifier design tasks. The classifier is constructed as a polynomial expansion…
Topic models have emerged as fundamental tools in unsupervised machine learning. Most modern topic modeling algorithms take a probabilistic view and derive inference algorithms based on Latent Dirichlet Allocation (LDA) or its variants. In…
Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including…
Variational inference is a very efficient and popular heuristic used in various forms in the context of latent variable models. It's closely related to Expectation Maximization (EM), and is applied when exact EM is computationally…
Recently, topic modeling has been widely used to discover the abstract topics in text corpora. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a…
In this paper we present a modification to a latent topic model, which makes the model exploit supervision to produce a factorized representation of the observed data. The structured parameterization separately encodes variance that is…
We propose a topic modeling approach to the prediction of preferences in pairwise comparisons. We develop a new generative model for pairwise comparisons that accounts for multiple shared latent rankings that are prevalent in a population…
Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA…
Current topic models often suffer from discovering topics not matching human intuition, unnatural switching of topics within documents and high computational demands. We address these concerns by proposing a topic model and an inference…
The estimation of parameters from data is a common problem in many areas of the physical sciences, and frequently used algorithms rely on sets of simulated data which are fit to data. In this article, an analytic solution for…
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model inference have been based on a maximum likelihood objective. Efficient algorithms exist…