Related papers: Structural Regularities in Text-based Entity Vecto…
Unsupervised learning of low-dimensional, semantic representations of words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations of our previously…
Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…
Most of the Natural Language Processing systems are involved in entity-based processing for several tasks like Information Extraction, Question-Answering, Text-Summarization and so on. A new challenge comes when entities play roles…
Entity Recognition (ER) within a text is a fundamental exercise in Natural Language Processing, enabling further depending tasks such as Knowledge Extraction, Text Summarisation, or Keyphrase Extraction. An entity consists of single words…
Knowledge is captured in the form of entities and their relationships and stored in knowledge graphs. Knowledge graphs enhance the capabilities of applications in many different areas including Web search, recommendation, and natural…
Entity linking, the task of mapping textual mentions to known entities, has recently been tackled using contextualized neural networks. We address the question whether these results -- reported for large, high-quality datasets such as…
We present an ensemble approach for categorizing search query entities in the recruitment domain. Understanding the types of entities expressed in a search query (Company, Skill, Job Title, etc.) enables more intelligent information…
Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological…
Topic models aim to reveal latent structures within a corpus of text, typically through the use of term-frequency statistics over bag-of-words representations from documents. In recent years, conceptual entities -- interpretable,…
Word embeddings are rich word representations, which in combination with deep neural networks, lead to large performance gains for many NLP tasks. However, word embeddings are represented by dense, real-valued vectors and they are therefore…
In standard methodology for natural language processing, entities in text are typically embedded in dense vector spaces with pre-trained models. The embeddings produced this way are effective when fed into downstream models, but they…
Named entities in text documents are the names of people, organization, location or other types of objects in the documents that exist in the real world. A persisting research challenge is to use computational techniques to identify such…
We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via…
Forms are a widespread type of template-based document used in a great variety of fields including, among others, administration, medicine, finance, or insurance. The automatic extraction of the information included in these documents is…
Word embeddings have gained significant attention as learnable representations of semantic relations between words, and have been shown to improve upon the results of traditional word representations. However, little effort has been devoted…
Traditional retrieval methods have been essential for assessing document similarity but struggle with capturing semantic nuances. Despite advancements in latent semantic analysis (LSA) and deep learning, achieving comprehensive semantic…
Recent advances in machine learning, particularly Large Language Models (LLMs) such as BERT and GPT, provide rich contextual embeddings that improve text representation. However, current document clustering approaches often ignore the…
The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the…
Entity resolution is a widely studied problem with several proposals to match records across relations. Matching textual content is a widespread task in many applications, such as question answering and search. While recent methods achieve…
Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging to predefined semantic types such as person, location, organization etc. NER always serves as the foundation for many natural language…