English
Related papers

Related papers: Combinatorial Topic Models using Small-Variance As…

200 papers

In this paper we present a model for unsupervised topic discovery in texts corpora. The proposed model uses documents, words, and topics lookup table embedding as neural network model parameters to build probabilities of words given topics,…

Computation and Language · Computer Science 2019-11-26 Sileye 0. Ba

Topic modeling is traditionally applied to word counts without accounting for the context in which words appear. Recent advancements in large language models (LLMs) offer contextualized word embeddings, which capture deeper meaning and…

Machine Learning · Statistics 2025-12-30 Morgane Austern , Yuanchuan Guo , Zheng Tracy Ke , Tianle Liu

We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM…

Statistical Finance · Quantitative Finance 2020-10-16 Paul Glasserman , Kriste Krstovski , Paul Laliberte , Harry Mamaysky

Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery. However, classical methods for…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-20 Rolf Jagerman , Carsten Eickhoff , Maarten de Rijke

In real world industrial applications of topic modeling, the ability to capture gigantic conceptual space by learning an ultra-high dimensional topical representation, i.e., the so-called "big model", is becoming the next desideratum after…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-11 Xun Zheng , Jin Kyu Kim , Qirong Ho , Eric P. Xing

We develop a privatised stochastic variational inference method for Latent Dirichlet Allocation (LDA). The iterative nature of stochastic variational inference presents challenges: multiple iterations are required to obtain accurate…

Machine Learning · Statistics 2018-12-05 Mijung Park , James Foulds , Kamalika Chaudhuri , Max Welling

As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text…

Information Retrieval · Computer Science 2019-07-04 Clint P. George , Wei Xia , George Michailidis

Topic modeling is a powerful technique for uncovering hidden themes within a collection of documents. However, the effectiveness of traditional topic models often relies on sufficient word co-occurrence, which is lacking in short texts.…

Computation and Language · Computer Science 2024-10-22 Pritom Saha Akash , Kevin Chen-Chuan Chang

The contribution of this paper is two-fold. First, we present Indexing by Latent Dirichlet Allocation (LDI), an automatic document indexing method. The probability distributions in LDI utilize those in Latent Dirichlet Allocation (LDA), a…

Information Retrieval · Computer Science 2014-12-12 Yanshan Wang , Jae-Sung Lee , In-Chan Choi

Using the 6,638 case descriptions of societal impact submitted for evaluation in the Research Excellence Framework (REF 2014), we replicate the topic model (Latent Dirichlet Allocation or LDA) made in this context and compare the results…

Computation and Language · Computer Science 2018-06-05 Tobias Hecking , Loet Leydesdorff

This article presents a probabilistic generative model for text based on semantic topics and syntactic classes called Part-of-Speech LDA (POSLDA). POSLDA simultaneously uncovers short-range syntactic patterns (syntax) and long-range…

Computation and Language · Computer Science 2013-03-13 William M. Darling , Fei Song

There has been an increasingly popular trend in Universities for curriculum transformation to make teaching more interactive and suitable for online courses. An increase in the popularity of online courses would result in an increase in the…

Information Retrieval · Computer Science 2020-11-03 Nikhil Fernandes , Alexandra Gkolia , Nicolas Pizzo , James Davenport , Akshar Nair

Topic modeling is widely used for uncovering thematic structures within text corpora, yet traditional models often struggle with specificity and coherence in domain-focused applications. Guided approaches, such as SeededLDA and CorEx,…

Computation and Language · Computer Science 2025-05-23 Chia-Hsuan Chang , Jui-Tse Tsai , Yi-Hang Tsai , San-Yih Hwang

Topic models are frequently used in machine learning owing to their high interpretability and modular structure. However, extending a topic model to include a supervisory signal, to incorporate pre-trained word embedding vectors and to…

Machine Learning · Statistics 2019-09-17 Ryohei Hisano

Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm,…

Machine Learning · Computer Science 2014-04-09 Jia Zeng , Zhi-Qiang Liu , Xiao-Qin Cao

In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to…

Computation and Language · Computer Science 2020-01-10 Xiaofeng Zhu , Diego Klabjan , Patrick Bless

Pre-trained language models have led to a new state-of-the-art in many NLP tasks. However, for topic modeling, statistical generative models such as LDA are still prevalent, which do not easily allow incorporating contextual word vectors.…

Computation and Language · Computer Science 2024-02-13 Johannes Schneider

Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based…

Machine Learning · Computer Science 2025-08-25 Sebastian Sanokowski , Sepp Hochreiter , Sebastian Lehner

Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics. Following…

Machine Learning · Statistics 2015-02-18 Dehua Cheng , Xinran He , Yan Liu

Recently, there has been considerable progress on designing algorithms with provable guarantees -- typically using linear algebraic methods -- for parameter learning in latent variable models. But designing provable algorithms for inference…

Machine Learning · Computer Science 2016-05-30 Sanjeev Arora , Rong Ge , Frederic Koehler , Tengyu Ma , Ankur Moitra
‹ Prev 1 4 5 6 7 8 10 Next ›