English
Related papers

Related papers: Learning Topic Models - Going beyond SVD

200 papers

Nonnegative matrix factorization (NMF) based topic modeling methods do not rely on model- or data-assumptions much. However, they are usually formulated as difficult optimization problems, which may suffer from bad local minima and high…

Information Retrieval · Computer Science 2021-02-26 JianYu Wang , Xiao-Lei Zhang

Topic modeling is a technique for organizing and extracting themes from large collections of unstructured text. Non-negative matrix factorization (NMF) is a common unsupervised approach that decomposes a term frequency-inverse document…

Machine Learning · Computer Science 2024-07-30 Selma Wanna , Ryan Barron , Nick Solovyev , Maksim E. Eren , Manish Bhattarai , Kim Rasmussen , Boian S. Alexandrov

Topic models have been extensively used to organize and interpret the contents of large, unstructured corpora of text documents. Although topic models often perform well on traditional training vs. test set evaluations, it is often the case…

Computation and Language · Computer Science 2017-07-04 Kelsey MacMillan , James D. Wilson

As the amount of text data continues to grow, topic modeling is serving an important role in understanding the content hidden by the overwhelming quantity of documents. One popular topic modeling approach is non-negative matrix…

Information Retrieval · Computer Science 2022-08-23 Maksim E. Eren , Nick Solovyev , Manish Bhattarai , Kim Rasmussen , Charles Nicholas , Boian S. Alexandrov

Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from admixtures, is NP-hard. Assuming…

Machine Learning · Statistics 2014-11-05 Trapit Bansal , Chiranjib Bhattacharyya , Ravindran Kannan

The probabilistic topic model imposes a low-rank structure on the expectation of the corpus matrix. Therefore, singular value decomposition (SVD) is a natural tool of dimension reduction. We propose an SVD-based method for estimating a…

Methodology · Statistics 2022-08-31 Zheng Tracy Ke , Minzhe Wang

Non-negative matrix factorization (NMF) based topic modeling is widely used in natural language processing (NLP) to uncover hidden topics of short text documents. Usually, training a high-quality topic model requires large amount of textual…

Computation and Language · Computer Science 2022-05-27 Shijing Si , Jianzong Wang , Ruiyi Zhang , Qinliang Su , Jing Xiao

We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a…

We utilize a recently developed topic modeling method called SeNMFk, extending the standard Non-negative Matrix Factorization (NMF) methods by incorporating the semantic structure of the text, and adding a robust system for determining the…

Digital Libraries · Computer Science 2022-01-04 Valentin Stanev , Erik Skau , Ichiro Takeuchi , Boian S. Alexandrov

Topic models have been prevalent for decades to discover latent topics and infer topic proportions of documents in an unsupervised fashion. They have been widely used in various applications like text analysis and context recommendation.…

Computation and Language · Computer Science 2024-06-25 Xiaobao Wu , Thong Nguyen , Anh Tuan Luu

Topic models have been widely used to learn text representations and gain insight into document corpora. To perform topic discovery, most existing neural models either take document bag-of-words (BoW) or sequence of tokens as input followed…

Computation and Language · Computer Science 2021-07-12 Madhur Panwar , Shashank Shailabh , Milan Aggarwal , Balaji Krishnamurthy

Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to…

Machine Learning · Computer Science 2022-02-01 Pengyu Li , Christine Tseng , Yaxuan Zheng , Joyce A. Chew , Longxiu Huang , Benjamin Jarman , Deanna Needell

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic…

Information Retrieval · Computer Science 2019-07-12 Adji B. Dieng , Francisco J. R. Ruiz , David M. Blei

Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-than-meaningful or even redundant topics when the data is biased…

Machine Learning · Computer Science 2021-02-08 Joshua Vendrow , Jamie Haddock , Elizaveta Rebrova , Deanna Needell

Topic modelling was mostly dominated by Bayesian graphical models during the last decade. With the rise of transformers in Natural Language Processing, however, several successful models that rely on straightforward clustering approaches in…

Machine Learning · Computer Science 2024-03-07 Arik Reuter , Anton Thielmann , Christoph Weisser , Benjamin Säfken , Thomas Kneib

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a…

Machine Learning · Statistics 2018-07-20 Martin Gerlach , Tiago P. Peixoto , Eduardo G. Altmann

In this work, we apply topic modeling using Non-Negative Matrix Factorization (NMF) on the COVID-19 Open Research Dataset (CORD-19) to uncover the underlying thematic structure and its evolution within the extensive body of COVID-19…

Computation and Language · Computer Science 2025-03-25 Divya Patel , Vansh Parikh , Om Patel , Agam Shah , Bhaskar Chaudhury

We describe the use of Non-Negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA) algorithms to perform topic mining and labelling applied to retail customer communications in attempt to characterize the subject of…

Machine Learning · Computer Science 2019-12-20 Rashid Mehdiyev , Jean Nava , Karan Sodhi , Saurav Acharya , Annie Ibrahim Rana

Topic models are frequently used in machine learning owing to their high interpretability and modular structure. However, extending a topic model to include a supervisory signal, to incorporate pre-trained word embedding vectors and to…

Machine Learning · Statistics 2019-09-17 Ryohei Hisano

Language model based methods are powerful techniques for text classification. However, the models have several shortcomings. (1) It is difficult to integrate human knowledge such as keywords. (2) It needs a lot of resources to train the…

Computation and Language · Computer Science 2024-02-09 Weijie Xu , Jay Desai , Srinivasan Sengamedu , Xiaoyu Jiang , Francis Iannacci
‹ Prev 1 2 3 10 Next ›