English
Related papers

Related papers: Combinatorial Topic Models using Small-Variance As…

200 papers

Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of…

Computation and Language · Computer Science 2025-05-30 Li Lucy , Camilla Griffiths , Sarah Levine , Jennifer L. Eberhardt , Dorottya Demszky , David Bamman

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as a fundamental tool for text analysis in various applications. However, the LDA model as well as the training…

Machine Learning · Computer Science 2020-10-12 Fangyuan Zhao , Xuebin Ren , Shusen Yang , Qing Han , Peng Zhao , Xinyu Yang

While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when…

Computation and Language · Computer Science 2018-09-05 Ryan J. Gallagher , Kyle Reing , David Kale , Greg Ver Steeg

Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal…

Computation and Language · Computer Science 2023-12-25 Liyi Zhang , R. Thomas McCoy , Theodore R. Sumers , Jian-Qiao Zhu , Thomas L. Griffiths

Contextualised word vectors obtained via pre-trained language models encode a variety of knowledge that has already been exploited in applications. Complementary to these language models are probabilistic topic models that learn thematic…

Computation and Language · Computer Science 2023-01-12 Mozhgan Talebpour , Alba Garcia Seco de Herrera , Shoaib Jameel

Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our…

Computation and Language · Computer Science 2020-10-07 Alexander Hoyle , Pranav Goel , Philip Resnik

Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind.…

Computation and Language · Computer Science 2021-07-27 Andreas Hamm , Simon Odrowski

We investigate the problem of learning a topic model - the well-known Latent Dirichlet Allocation - in a distributed manner, using a cluster of C processors and dividing the corpus to be learned equally among them. We propose a simple…

Machine Learning · Computer Science 2009-09-28 James Petterson , Tiberio Caetano

Latent variable models (LVMs) represent observed variables by parameterized functions of latent variables. Prominent examples of LVMs for unsupervised learning are probabilistic PCA or probabilistic SC which both assume a weighted linear…

Machine Learning · Computer Science 2023-12-18 Hamid Mousavi , Jakob Drefs , Florian Hirschberger , Jörg Lücke

Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of…

Computation and Language · Computer Science 2025-07-15 Thong Nguyen , Xiaobao Wu , Xinshuai Dong , Cong-Duy T Nguyen , See-Kiong Ng , Anh Tuan Luu

Collaborative Topic Regression (CTR) combines ideas of probabilistic matrix factorization (PMF) and topic modeling (e.g., LDA) for recommender systems, which has gained increasing successes in many applications. Despite enjoying many…

Machine Learning · Computer Science 2016-05-31 Chenghao Liu , Tao Jin , Steven C. H. Hoi , Peilin Zhao , Jianling Sun

The main goal of this paper is to explore latent topic analysis (LTA), in the context of quantum information retrieval. LTA is a valuable technique for document analysis and representation, which has been extensively used in information…

Machine Learning · Computer Science 2019-03-08 Fabio A. González , Juan C. Caicedo

Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is…

Machine Learning · Computer Science 2018-11-22 Bryan Wilder , Bistra Dilkina , Milind Tambe

Research background: With the continuous development of society, consumers pay more attention to the key information of product fine-grained attributes when shopping. Research purposes: This study will fine tune the Sentence-BERT word…

Computation and Language · Computer Science 2025-04-14 Jianheng Li , Lirong Chen

Discriminative features play an important role in image and object classification and also in other fields of research such as semi-supervised learning, fine-grained classification, out of distribution detection. Inspired by Linear…

Computer Vision and Pattern Recognition · Computer Science 2021-07-14 Mai Lan Ha , Gianni Franchi , Emanuel Aldea , Volker Blanz

Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often…

Computation and Language · Computer Science 2024-04-26 Lowri Williams , Eirini Anthi , Laura Arman , Pete Burnap

Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial…

Information Retrieval · Computer Science 2012-06-18 David Mimno , Andrew McCallum

Crowdfunding in the realm of the Social Web has received substantial attention, with prior research examining various aspects of campaigns, including project objectives, durations, and influential project categories for successful…

Computation and Language · Computer Science 2024-01-09 Prathamesh Muzumdar , George Kurian , Ganga Prasad Basyal

The latent Dirichlet allocation (LDA) model is a widely-used latent variable model in machine learning for text analysis. Inference for this model typically involves a single-site collapsed Gibbs sampling step for latent variables…

Computation · Statistics 2016-08-03 Xin Zhang , Scott A. Sisson

Topic models are popular for modeling discrete data (e.g., texts, images, videos, links), and provide an efficient way to discover hidden structures/semantics in massive data. One of the core problems in this field is the posterior…

Machine Learning · Statistics 2015-12-11 Khoat Than , Tu Bao Ho
‹ Prev 1 8 9 10 Next ›