Related papers: Scalable Probabilistic Entity-Topic Modeling

Distributed Entity Disambiguation with Per-Mention Learning

Entity disambiguation, or mapping a phrase to its canonical representation in a knowledge base, is a fundamental step in many natural language processing applications. Existing techniques based on global ranking models fail to capture the…

Computation and Language · Computer Science 2016-04-21 Tiep Mai , Bichen Shi , Patrick K. Nicholson , Deepak Ajwani , Alessandra Sala

Embedded Topic Models Enhanced by Wikification

Topic modeling analyzes a collection of documents to learn meaningful patterns of words. However, previous topic models consider only the spelling of words and do not take into consideration the homography of words. In this study, we…

Computation and Language · Computer Science 2024-10-04 Takashi Shibuya , Takehito Utsuro

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding,…

Computation and Language · Computer Science 2016-02-01 Octavian-Eugen Ganea , Marina Ganea , Aurelien Lucchi , Carsten Eickhoff , Thomas Hofmann

Using Variational Inference and MapReduce to Scale Topic Modeling

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference of LDA. In this…

Artificial Intelligence · Computer Science 2011-07-20 Ke Zhai , Jordan Boyd-Graber , Nima Asadi

Contextual Augmentation for Entity Linking using Large Language Models

Entity Linking involves detecting and linking entity mentions in natural language texts to a knowledge graph. Traditional methods use a two-step process with separate models for entity recognition and disambiguation, which can be…

Computation and Language · Computer Science 2025-10-23 Daniel Vollmers , Hamada M. Zahera , Diego Moussallem , Axel-Cyrille Ngonga Ngomo

We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent…

Computation and Language · Computer Science 2014-09-29 Arun S. Maiya , Robert M. Rolfe

On a Topic Model for Sentences

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text…

Computation and Language · Computer Science 2016-06-02 Georgios Balikas , Massih-Reza Amini , Marianne Clausel

A high-reproducibility and high-accuracy method for automated topic classification

Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent…

Machine Learning · Statistics 2014-02-04 Andrea Lancichinetti , M. Irmak Sirer , Jane X. Wang , Daniel Acuna , Konrad Körding , Luís A. Nunes Amaral

Large scale link based latent Dirichlet allocation for web document classification

In this paper we demonstrate the applicability of latent Dirichlet allocation (LDA) for classifying large Web document collections. One of our main results is a novel influence model that gives a fully generative model of the document…

Information Retrieval · Computer Science 2010-06-28 István Bíró , Jácint Szabó

Model-Parallel Inference for Big Topic Models

In real world industrial applications of topic modeling, the ability to capture gigantic conceptual space by learning an ultra-high dimensional topical representation, i.e., the so-called "big model", is becoming the next desideratum after…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-11 Xun Zheng , Jin Kyu Kim , Qirong Ho , Eric P. Xing

Discovering topics with neural topic models built from PLSA assumptions

In this paper we present a model for unsupervised topic discovery in texts corpora. The proposed model uses documents, words, and topics lookup table embedding as neural network model parameters to build probabilities of words given topics,…

Computation and Language · Computer Science 2019-11-26 Sileye 0. Ba

Source-LDA: Enhancing probabilistic topic models using prior knowledge sources

A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these…

Computation and Language · Computer Science 2017-05-19 Justin Wood , Patrick Tan , Wei Wang , Corey Arnold

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and…

Computation and Language · Computer Science 2019-06-05 Phong Le , Ivan Titov

Entity Disambiguation with Entity Definitions

Local models have recently attained astounding performances in Entity Disambiguation (ED), with generative and extractive formulations being the most promising research directions. However, previous works limited their studies to using, as…

Computation and Language · Computer Science 2022-10-12 Luigi Procopio , Simone Conia , Edoardo Barba , Roberto Navigli

MetaLDA: a Topic Model that Efficiently Incorporates Meta information

Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating…

Computation and Language · Computer Science 2017-09-20 He Zhao , Lan Du , Wray Buntine , Gang Liu

Beyond Word Embeddings: Learning Entity and Concept Representations from Large Scale Knowledge Bases

Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these…

Computation and Language · Computer Science 2018-12-21 Walid Shalaby , Wlodek Zadrozny , Hongxia Jin

Leveraging Deep Neural Networks and Knowledge Graphs for Entity Disambiguation

Entity Disambiguation aims to link mentions of ambiguous entities to a knowledge base (e.g., Wikipedia). Modeling topical coherence is crucial for this task based on the assumption that information from the same semantic context tends to…

Computation and Language · Computer Science 2015-04-30 Hongzhao Huang , Larry Heck , Heng Ji

Automatic Labelling of Topics with Neural Embeddings

Topics generated by topic models are typically represented as list of terms. To reduce the cognitive overhead of interpreting these topics for end-users, we propose labelling a topic with a succinct phrase that summarises its theme or idea.…

Computation and Language · Computer Science 2016-12-26 Shraey Bhatia , Jey Han Lau , Timothy Baldwin

Joint Neural Entity Disambiguation with Output Space Search

In this paper, we present a novel model for entity disambiguation that combines both local contextual information and global evidences through Limited Discrepancy Search (LDS). Given an input document, we start from a complete solution…

Computation and Language · Computer Science 2019-08-23 Hamed Shahbazi , Xiaoli Z. Fern , Reza Ghaeini , Chao Ma , Rasha Obeidat , Prasad Tadepalli

Pangloss: Fast Entity Linking in Noisy Text Environments

Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in…

Information Retrieval · Computer Science 2018-07-18 Michael Conover , Matthew Hayes , Scott Blackburn , Pete Skomoroch , Sam Shah